Information Retrieval
Information Retrieval
Information retrieval is generally considered as a subfield of computer science that deals with
the representation, storage, and access of information. Information retrieval is concerned with
the organization and retrieval of information from large database collections. Information
Retrieval (IR) is the process by which a collection of data is represented, stored, and searched
for the purpose of knowledge discovery as a response to a user request (query). This process
involves various stages initiate with representing data and ending with returning relevant
information to the user.
Intermediate stage includes filtering, searching, matching and ranking operations. The main
goal of information retrieval system (IRS) is to “finding relevant information or a document that
satisfies user information needs”. To achieve this goal, IRSs usually implement following
processes:
An IR model specifies the details of the document representation, the query representation and
the retrieval functionality. The fundamental IR models can be classified into Boolean, vector,
probabilistic and inference network model. The rest of this section briefly describes these
models.
Boolean Model
The Boolean model is the _rst model of information retrieval and probably also the most
criticised model. The Boolean model is the _rst model of information retrieval and probably
also the most criticised model. The model can be explained by thinking of a query term as a
unambiguous de_nition of a set of documents. For instance, the query term economic simply
de_nes the set of all documents that are indexed with the term economic. Using the operators
of George Boole's mathematical logic, query terms and their corresponding sets of documents
can be combined to form new sets of documents. The Boolean model allows for the use of
operators of Boolean algebra, AND, OR and NOT, for query formulation, but has one major
disadvantage: a Boolean system is not able to rank the returned list of documents. In the
Boolean model, a document is associated with a set of keywords. Queries are also expressions
of keywords separated by AND, OR, or NOT/BUT.
The retrieval function in this model treats a document as either relevant or irrelevant. In Figure
2, the retrieved sets are visualised by the shaded areas.
Inference Network Model
In general, a document can be represented as a set of keywords (or key phrases) that
contribute to the description of its content. During the indexing phase, when the collection and
the storage of the data are performed, texts are usually preprocessed in order to remove stop
words and to perform stemming.
Additionally, the obtained words and stems can be reconnected to their synonyms in order to
create relations between words and concept classes.
IR systems try to retrieve all the documents that are relevant to a user query while minimizing
the number of nonrelevant documents retrieved.
Inversion Indices
Each document can be represented by a list of keywords which describe the contents of the
document for retrieval purposes. Fast retrieval can be achieved if we invert on those keywords. The
keywords are stored, eg alphabetically; in the index file for each keyword we maintain a list of
pointers to the qualifying documents in the postings file. This method is followed by almost all
the commercial systems.
SEARCHING TECHNIQUES
There are various searching algorithms, including linear search, binary search, brute force
search etc. some general searching algorithms are described below:
Digital Library
A digital library is a library in which collections are stored in digital formats and accessible by
computers. The digital content may be stored locally, or accessed remotely via computer
networks. A digital library is a type of information retrieval system.
Search Engines
A search engine is one of the most the practical applications of information retrieval techniques
to large scale text collections. Web search engines are best‐ known examples, but many others
searches exist, like: Desktop search, Enterprise search, Federated search, Mobile search, and
Social search.
CONCLUSION
At last we conclude that, information retrieval is a process of searching and retrieving the
knowledge based information from collection of documents.