0% found this document useful (0 votes)
223 views11 pages

Chapter 1 Introduction To ISR

Information storage and retrieval involves storing information from various sources in a way that allows it to be easily retrieved upon request. Information storage involves collecting information and storing it physically or electronically. Information retrieval is the process of searching for and delivering relevant information to users. It involves indexing documents, representing queries and documents, ranking documents by relevance to queries, and returning search results. Common information retrieval models include probabilistic, vector space, Boolean, and language models.

Uploaded by

Tolosa Tafese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
223 views11 pages

Chapter 1 Introduction To ISR

Information storage and retrieval involves storing information from various sources in a way that allows it to be easily retrieved upon request. Information storage involves collecting information and storing it physically or electronically. Information retrieval is the process of searching for and delivering relevant information to users. It involves indexing documents, representing queries and documents, ranking documents by relevance to queries, and returning search results. Common information retrieval models include probabilistic, vector space, Boolean, and language models.

Uploaded by

Tolosa Tafese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Information Storage and Retrieval

Chapter One

Introduction to ISR
What is Information Storage and Retrieval?
o It is the System which are used to store information gathered from different sources in
such a way that it can be retrieved easily and effectively upon request.
What is Information Storage?
o Collecting information from different resources and storing it in either storage room
(maintaining paper records) or the storage devices such as hard disk, DVD, CD is
called as information storage. This information may be in any of the form that is
audio, video, text.
What is Information Retrieval?
o IR System is mainly focus on electronic searching and retrieving old
documents.
o Information Retrieval (IR) can be defined as a software program that deals
with the organization, storage, retrieval, and evaluation of information
from document repositories, particularly textual information.
o The process of searching, fetching and serving of information to the requested users is
information retrieval.
o An IR System is capable of performing operations like
 methods for adding documents to the database,
 Modifying or deleting them from the database,
 Methods for searching and
 Serving appropriate document to the users.
o Information Retrieval is an activity of obtaining relevant documents based on user
needs from collection of retrieved documents.
o The IR system assists the users in finding the information they require but it does
not explicitly return the answers to the question.

AUWC Compiled by Kassahun M. Page 1


Information Storage and Retrieval

o Information retrieval is the activity of obtaining information resources relevant for a


user’s information need from a collection of information resources.
Elements of an information retrieval process:
o Information needs (users express them in the form of queries)
o Information (re)sources, most often unstructured (text, images, video,
audio, etc.)
o A system/method/model for identifying (re)sources relevant for a given
information need (usually from a large collection of information resources)

Figure 1: Basic information retrieval system

A static, or relatively static, document collection is indexed prior to


any user query. A query is issued and a set of documents that are deemed
relevant to the query are ranked based on their computed similarity to the
query and presented to the user query.
Information Retrieval (IR) is devoted to finding relevant documents, not finding simple
matches to patterns.
Automated information retrieval (IR) systems were originally developed to help manage
the huge scientific literature that has developed since the 1940s. Many university,
corporate, and public libraries now use IR systems to provide access to books, journals,
and other documents. Commercial IR systems offer databases containing millions of
documents in myriad subject areas. Dictionary and encyclopedia databases are now
widely available for PCs. IR has been found useful in such disparate areas as office
automation and software engineering.
Data and Information Retrieval

AUWC Compiled by Kassahun M. Page 2


Information Storage and Retrieval

o Data retrieval, in the context of an IR system, consists mainly of


determining which documents of a collection contain the keywords in
the user query which, most frequently, is not enough to satisfy the user
information need.
o In fact, the user of an IR system is concerned more with retrieving
information about a subject than with retrieving data which satisfies a
given query.
o A data retrieval language aims at retrieving all objects which satisfy
clearly defined conditions such as those in a regular expression or in a
relational algebra expression.
 Thus, for a data retrieval system, a single erroneous object among a
thousand retrieved objects means total failure.
o For an information retrieval system, however, the retrieved objects might be
inaccurate and small errors are likely to go unnoticed.
The main reason for this difference is that
 Information retrieval usually deals with natural language text which is
not always well structured and could be semantically ambiguous.
 Data retrieval system (such as a relational database) deals with data that
has a well-defined structure and semantics.
Data retrieval, while providing a solution to the user of a database system, does not
solve the problem of retrieving information about a subject or topic.
To be effective in its attempt to satisfy the user information need,
o The IR system must somehow `interpret' the contents of the information
items (documents) in a collection and rank them according to a degree
of relevance to the user query. This `interpretation' of a document content
involves extracting syntactic and semantic information from the document
text and using this information to match the user information need.

AUWC Compiled by Kassahun M. Page 3


Information Storage and Retrieval

The difficulty is not only knowing how to extract this information but also
knowing how to use it to decide relevance. Thus, the notion of relevance is at the
center of information retrieval. In fact, the primary goal of an IR system is to
retrieve all the documents which are relevant to a user query while retrieving
as few non-relevant documents as possible.
Difference between Information Retrieval and Data Retrieval
Information Retrieval Data Retrieval
The software the program that deals with Data retrieval deals with obtaining data
the organization, storage, retrieval, and from a database management system
evaluation of information from document such as ODBMS. It is A process of
repositories particularly textual identifying and retrieving the data from
information. the database, based on the query provided
by user or application.
Retrieves information about a subject. Determines the keywords in the user
query and retrieves the data.
Small errors are likely to go unnoticed. A single error object means total failure.
Not always well structured and is Has a well-defined structure and
semantically ambiguous. semantics.
Does not provide a solution to the user of Provides solutions to the user of the
the database system. database system.
The results obtained are approximate The results obtained are exact matches.
matches.
Results are ordered by relevance. Results are unordered by relevance.
It is a probabilistic model. It is a deterministic model.
o Data Retrieval systems directly retrieve data from database management systems like
ODBMS by identifying keywords in the queries provided by users and matching them
with the documents in the database.
o Whereas the Information Retrieval system in DBMS is a set of algorithms or programs
that involve storing, retrieving, evaluation of document and query representations,
especially text-based, to display results based on similarity.
What is an IR Model?

AUWC Compiled by Kassahun M. Page 4


Information Storage and Retrieval

o An Information Retrieval (IR) model selects and ranks the document that
is required by the user or the user has asked for in the form of a query.
o The documents and the queries are represented in a similar manner, so that
document selection and ranking can be formalized by a matching function
that returns a retrieval status value (RSV) for each document in the
collection.
o Many of the Information Retrieval systems represent document contents by
a set of descriptors, called terms, belonging to a vocabulary V.
o An IR model determines the query-document matching function according
to four main approaches:
 The estimation of the probability of user’s relevance rel for each
document d and query q with respect to a set R q of training
documents: Prob (rel|d, q, Rq)
Types of IR Models

AUWC Compiled by Kassahun M. Page 5


Information Storage and Retrieval

Components of Information Retrieval/ IR Model

AUWC Compiled by Kassahun M. Page 6


Information Storage and Retrieval

1. Acquisition: In this step, the selection of documents and other objects from
various web resources that consist of text-based documents takes place.
o The required data is collected by web crawlers and stored in the
database.
2. Representation: It consists of indexing that contains free-text terms,
controlled vocabulary, manual & automatic techniques as well.
o Example: Abstracting contains summarizing and Bibliographic
description that contains author, title, sources, data, and metadata.
3. File Organization: There are two types of file organization methods.
1. Sequential: It contains documents by document data.
2. Inverted: It contains term by term, list of records under each term.
Combination of both.
4. Query: An IR process starts when a user enters a query into the system.
o Queries are formal statements of information needs, for example,
search strings in web search engines.

AUWC Compiled by Kassahun M. Page 7


Information Storage and Retrieval

o In information retrieval, a query does not uniquely identify a single


object in the collection. Instead, several objects may match the query,
perhaps with different degrees of relevancy.
User Interaction with Information Retrieval System

The User Task: The information first is supposed to be translated into a query by the user.
o In the information retrieval system, there is a set of words that convey the semantics
of the information that is required whereas, in a data retrieval system, a query
expression is used to convey the constraints which are satisfied by the objects.
 Example: A user wants to search for something but ends up searching
with another thing. This means that the user is browsing and not
searching. The above figure shows the interaction of the user through
different tasks.
Logical View of the Documents: A long time ago, documents were represented through a
set of index terms or keywords.
o Nowadays, modern computers represent documents by a full set of words which
reduces the set of representative keywords. This can be done by eliminating stop
words i.e. articles and connectives. These operations are text operations. These
text operations reduce the complexity of the document representation from full
text to set of index terms.

AUWC Compiled by Kassahun M. Page 8


Information Storage and Retrieval

Past, Present, and Future of Information Retrieval


1. Early Developments: As there was an increase in the need for a lot of
information, it became necessary to build data structures to get faster access.
The index is the data structure for faster retrieval of information. Over
centuries manual categorization of hierarchies was done for indexes.
2. Information Retrieval in Libraries: Libraries were the first to adopt IR
systems for information retrieval.
 In first-generation, it consisted, automation of previous technologies,
and the search was based on author name and title.
 In the second generation, it included searching by subject heading,
keywords, etc.
 In the third generation, it consisted of graphical interfaces,
electronic forms, hypertext features, etc.
3. The Web and Digital Libraries: It is cheaper than various sources of
information, it provides greater access to networks due to digital
communication and it gives free access to publish on a larger medium.
Types of Information Retrieval Model
An information retrieval comprises of the following four key elements:
1. D − Document Representation.
2. Q − Query Representation.
3. F − A framework to match and establish a relationship between D and Q.
4. R (q, di) − A ranking function that determines the similarity between the
query and the document to display relevant information.

There are three types of Information Retrieval (IR) models:

AUWC Compiled by Kassahun M. Page 9


Information Storage and Retrieval

1. Classical IR Model: - it is designed upon basic mathematical concepts and is


the most widely-used of IR models. Classic Information Retrieval models
can be implemented with ease.
 Its examples include Vector-space, Boolean and Probabilistic IR
models.
 In this system, the retrieval of information depends on documents
containing the defined set of queries. There is no ranking or grading
of any kind.
 The different classical IR models take Document Representation,
Query representation, and Retrieval/Matching function into account in
their modeling.
2. Non-Classical IR Model: - They differ from classic models in that they are
built upon propositional logic.
 Examples of non-classical IR models include Information Logic,
Situation Theory, and Interaction models.
3. Alternative IR Model: - These take principles of classical IR model and
enhance upon to create more functional models like the Cluster model,
Alternative Set-Theoretic Models Fuzzy Set model, Latent Semantic
Indexing (LSI) model, Alternative Algebraic Models Generalized Vector
Space Model, etc.
Let’s understand the most-adopted similarity-based classical IR models in further
detail:
1. Boolean Model: -This model required information to be translated into a
Boolean expression and Boolean queries. The latter is used to determine the
information needed to be able to provide the right match when the Boolean
expression is found to be true. It uses Boolean operations AND, OR, NOT to
create a combination of multiple terms based on what the user asks.

AUWC Compiled by Kassahun M. Page 10


Information Storage and Retrieval

2. Vector Space Model: -This model takes documents and queries denoted as
vectors and retrieves documents depending on how similar they are. This
can result in two types of vectors which are then used to rank search results
either
 Binary in Boolean VSM.
 Weighted in Non-binary VSM.
3. Probability Distribution Model: - In this model, the documents are
considered as distributions of terms and queries are matched based on the
similarity of these representations. This is made possible using entropy or by
computing the probable utility of the document. They are if two types:
 Similarity-based Probability Distribution Model
 Expected-utility-based Probability Distribution Model
4. Probabilistic Models: -The probabilistic model is rather simple and takes the
probability ranking to display results. To put it simply, documents are
ranked based on the probability of their relevance to a searched query.
Components of Information Retrieval Model
Here are the prerequisites for an IR model:
1. An automated or manually-operated indexing system used to index and
search techniques and procedures.
2. A collection of documents in any one of the following formats: text, image
or multimedia.
3. A set of queries that serve as the input to a system, via a human or machine.
4. An evaluation metric to measure or evaluate a system’s effectiveness (for
instance, precision and recall). For instance, to ensure how useful the
information displayed to the user is.

AUWC Compiled by Kassahun M. Page 11

You might also like