Group 5 Assignment for IR

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

MIZAN-TEPI -UNIVERSITY TEPI- CAMPUS

SCHOOL OF COMPUTING AND INFORMATICS


DEPARTMENT OF INFORMATON TECHNOLOGY
Information Retrieval Assignment
Title: Query Expansion
Group members
Name ID Number
1. Solomon Fekede-------------------------------310/11
2. Teju Sitotaw--------------------------------------328/11.
2 3
x x x x
e =1+ + + + … ,− ∞ < x< ∞
1! 2! 3!

3. Desta Nigusse------------------------------------117/11
4. Tefera Misganaw--------------------------------325/11
5. Lemlem Kebede---------------------------------215/10
Sublimissin date:- 01/11/21G.c
Submitted to:-

Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 1


Mr. Tilahun
Conetent of the topics……………………………………………………..Page
1. Abstract and introduction…………………………………………………………………..3
2. Definition of Query Expansion……………………………………………………………..4
3. Applications of Query Expansion………………………………………………………...5
4. Steps or working methodology of Query Expansion……………………………..7
5. Approaches of Query Expansions………………………………………………………..9
6. Advantages of Query Expansions……………………………………………………….10
7. Conclusion and Refference ……………………………..…………………………………11

Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 2


1 Abstract
With the ever increasing size of web, relevant information extraction on the Internet with a
query formed by a few keywords has become a big challenge. To overcome this, query
expansion (QE) plays a crucial role in improving the Internet searches, where the user’s initial
query is reformulated to a new query by adding new meaningful terms with similar significance.
QE – as part of information retrieval (IR) – has long attracted researchers’ attention. It has also
become very influential in the field of personalized social document, Question Answering over
Linked Data (QALD), and, Text Retrieval Conference (TREC) and REAL sets.

2 Introductions
There is huge amount of data available on Internet and it is growing exponentially. This
unconstrained information-growth has not been accompanied by an analogous expansion of
approaches for extracting relevant information. Often, a web-search does not yield relevant
results. There are multiple reasons for this. First, the keywords submitted by the user can be
related to multiple topics; the search results are not focused on the topic of interest. Second, the
query can be too short to express appropriately what the user is looking for. This can happen
simply as a matter of habit (average size of a web search is 2.4 words1. Third, the user is often
not sure about what he is looking for until he sees the results. Fourth, even if the user knows
what he is searching for, he does not know how to formulate the appropriate query .

3 What is Query Expansion?


Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 3
Query expansion is a method for improving retrieval performance by supplementing an original
query with additional terms. Expansion can take place in the initial query formulation, the query
reformulation stage of the online search, or both and can be performed manually, automatically,
or interactively. Reviews current research on manual, automatic, and interactive query
expansion.

Query expansion reformulates user’s original query to enhance the information retrieval
effectiveness. Let a user query consist of n terms, Q = {t1, t2, ..., ti , ti+1, ..., TN}. The 1query can
have two components: addition of new terms T 0={t 0 1 , t 0 2 , ..., t 0 m} from the data source(s)

as: Qexp = (Q − T 00) ∪ T


D and removal of stop words T 00= {ti+1, ti+2, ..., TN }. The reformulated query can be represented

So,Qexp = (Q − T 00) ∪ T0 = {t1, t2, ..., ti , t0 1 , t0 2 , ..., t0 m}

4 Application of Query Expansion


Beyond the key area of information retrieval, there are other recent applications where QE
technique has proved beneficial. We discuss some of such applications next. These are described
as follow:-

 Personalized Social Documents


 Question Answering
 Cross-Language Information Retrieval (CLIR)
 4 Information Filtering
 Multimedia Information Retrieval
Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 4
 Other Applications

4.1 Personalized Social Documents


In recent years social tagging systems have achieved popularity by being used in sharing, tagging,
commenting, rating, etc., of multi-media contents. Every user wants to find relevant information
according to their interests and commitments. This has generated need of a query expansion
framework that is based on social bookmarking and tagging systems, which enhance document
representation.

4.2 Question Answering


Question Answering (QA) has become a very influential research area in the field of information
retrieval system. The main objective of QA is to grant a quick answer in response to a user’s query.
Here the focus is to keep the answer concise rather than retrieving all relevant documents.

4.3 Cross-Language Information Retrieval (CLIR)


It is the part of IR which retrieves the information present in languages different from the user’s
query language. For example a user’s query is in Hindi but his retrieved relevant information
can be in English.

4.4 Information Filtering


Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 5
Information filtering (IF) is a method to eliminate inessential information from the
entire dataset and deliver the relevant results to the end user. Information filtering is
widely used in various domains such as searching Internet, e-mail, e-commerce,
multimedia distributed system. There are two basic approaches for Information
filtering (IF):-

 content-based filtering
 A content discovery platform is an
implemented software recommendation platform which uses recommender
system tools. It utilizes user metadata in order to discover and recommend
appropriate content and

collaborative filtering
Collaborative filtering (CF) is a technique used by recommender systems.[1] Collaborative
filtering has two senses, a narrow one and a more general one.[2]
In the newer, narrower sense, collaborative filtering is a method of making
automatic predictions (filtering) about the interests of a user by collecting preferences
or taste information from many users (collaborating)
 .

4.5 Multimedia Information Retrieval


Multimedia information retrieval (MIR) deals with searching and extracting the semantic information
from multimedia documents (such as audio, video and image) (see [161] for review). Most of the
MIR systems typically rely on text-based search on multimedia documents such as title, captions,
anchor text, annotations and surrounding html or xml depiction. This approach can fail when

Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 6


metadata is absent or when the metadata cannot exactly describe the actual multimedia content.
Hence, query expansion plays a crucial role in extracting the most relevant multimedia data.

4.6 Other Applications


Some other recent applications of query expansion are:-

 Plagiarism detection
 Event search
 Text classification
 Patent retrieval
 In dynamic process in IoT
 Classification of e-commerce and etc….

5 Query Expansion Working Methodology


The process of generating query expansion consists of mainly four steps:-
1 Preprocessing of data sources.
2 Term weights and ranking.
3 Term selection and.
4 Query reformulation.

Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 7


5.1 Preprocessing of data sources.
Preprocessing of a data source is depends upon the data sources and approaches used for query
expansion, instead of user’s query. The primary goal of this step (preprocessing of data source) is to
extract a set of terms form data sources that meaningfully augment user’s original query.

5.2 Weighting and Ranking of Query Expansion Terms


In this step of QE, weights and ranks have been assigned to query expansion terms. There are many
techniques for weighting and ranking of query expansion terms. These are are classified into four
categories on the basis of relationship between the query terms and the expansion features: –

 One-to-One Association. Such as WordNet to find synonyms and similar terms for the
query terms.
 One-to-Many Association. Correlates one query term to many expanded query terms.
 Feature Distribution of Top Ranked Documents. Deals with the top retrieved documents
from the initial query and considers the top weighted terms from these documents.

Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 8


 Query Language Modeling. Constructs a statistical model for the query and choses the
expansion terms having highest probability
5.3 Selection of Query Expansion Terms
After weighting and ranking of expansion terms the top-ranked terms are selected for query
expansion. The term selection is done on individual basis; mutually dependence of terms is
not considered.

5.4 Query Reformulation


This is the last step of query expansion, where the expanded query is reformulated to achieve better
results when used for retrieving relevant documents. The reformulation is done based on weights
assigned to the individual terms of the expanded query – known as query reweighting It can be
formulated as: w’ t,qe = (1 − λ). wt,q + λ . Wt (29) where w ‘ t,qe is the reweighting of term t of the
expanded query qe, Wt is a weight assigned to the expansion term t, and, λ is weighting parameter,
which weight the comparative contribution of the original query (q) terms and the expansion terms.

6 Classifications of Query Expansion Approaches


On the basis of data sources used in query expansion, several approaches have been proposed. All
these approaches can be classified into two main groups:-

1 Global analysis
In global analysis, query expansion techniques implicitly select expansion terms from hand-built
knowledge resources or from large corpora for expanding/reformulating the initial query

Global analysis can be further split into four subclasses:-

Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 9


Linguistic-based Approaches
Corpus-based Approaches
Search log-based Approaches
Web-based Approaches

2 Local analysis
Local analysis includes query expansion techniques that select expansion terms from documents
collection retrieved in response to the user’s initial (unmodified) query.

Local analysis can be further split two three subclasses:-

Relevance Feedback (RF) and Pseudo-relevance Feedback (PRF)

7 Advantages of Query Expansion


One of the major importance of the QE is that it enhances the chance to retrieve the relevant
information on the Internet, which is not retrieved otherwise using the original query. While this
improves the recall ratio, attempt to retrieve a lot of relevant documents adversely affects precision.
Many times user’s original query is not sufficient to retrieve the information user intends or is
looking for. In this situation, QE plays a crucial role in improving the Internet searches. For example,
if the user’s original query is Novel, it is not understandable what the user wants: the user may be
searching for fictitious narrative book or the user may be interested in something new or unusual. So
QE expands the original query “Novel” to “Novel book” “Book” “New” “Novel approach”.

Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 10


8 Conclusions
The lack of query terms increases the ambiguity in choosing among the many possible
synonymous meanings of the query terms. This heightens the problem of vocabulary mismatch.
expansion approaches can be manual, automatic or interactive, (such as linguistic, corpusbased,
web-based, search log-based, RF and PRF); they expand the user’s original query on the basis of
query features and available data sources. And it is important to decrease ambiguity in query of
information in day today activities.

9 References
https://en.wikipedia.org/wiki/Query_expansion

https://queryunderstanding.com/query-expansion-2d68d47cf9c8

https://nlp.stanford.edu/IR-book/html/htmledition/query-expansion-1.html

Abdulla, A.A.A., Lin, H., Xu, B., Banbhrani, S.K.: Improving biomedical information retrieval by linear
combinations of different query expansion techniques. BMC bioinformatics 17(7), 238 (2016)

Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 11


Mizan Tepi University Prepared by It 3rd year batch Group 5 Page 12

You might also like