0% found this document useful (0 votes)
11 views29 pages

IR Chapter 1

The document provides an introduction to Information Retrieval (IR), distinguishing it from data retrieval and outlining its significance in managing unstructured information. It covers the basic structure of IR systems, the retrieval process, and the objectives of these systems, emphasizing their role in efficiently finding relevant information. Additionally, it discusses the evolution of IR from traditional document management to modern web search applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views29 pages

IR Chapter 1

The document provides an introduction to Information Retrieval (IR), distinguishing it from data retrieval and outlining its significance in managing unstructured information. It covers the basic structure of IR systems, the retrieval process, and the objectives of these systems, emphasizing their role in efficiently finding relevant information. Additionally, it discusses the evolution of IR from traditional document management to modern web search applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Introduction to Information

Storage & Retrieval

Class : Information science


Year III ,Semester 1
Instructor :- Tizita Z.
Chapter one :- Introduction
Contents
• IR and IR systems
• Data versus information retrieval
• IR and the retrieval process
• Basic structure of an IR system

2
Information Retrieval
▪ It is a research field traditionally separate from Databases
▪ Goes back to IBM, Rand and Lockheed in the 50’s
▪ G. Salton at Cornell in the 60’s
▪ Lots of research since then
▪ Products traditionally separate
▪ Originally, document management systems for libraries,
government, law, etc.
▪ Gained prominence in recent years due to web search

3
What is IR?
▪ The study of methods and structures used to
represent and access information (Witten et al.)
▪ IR deals with the representation, storage,
organization, and access to information items
(Salton ).
▪ Information Retrieval (IR) is finding material
(usually documents) of an unstructured nature
(usually text) that satisfies an information need from
within large collections (usually stored on
computers) ( Manning et al.).

4
What is IR?

▪ Information retrieval is a problem-oriented discipline,


concerned with the problem of the effective and
efficient transfer of desired information between
human generator and human user.
▪ IR can be defined as :-
▪ Task : To find a few among many
▪ It is probably motivated by the situation of information
overload and acts as a remedy to it
▪ When defining IR, we need to be aware that there is a
broad sense and a narrow sense .

5
Broad Sense of IR
▪ It is a discipline that finds information that people
want
▪ The motivation behind would include
▪ Humans’ desire to understand the world and to gain
knowledge
▪ Acquire sufficient and accurate information/answer to
accomplish a task

6
Broad Sense of IR
▪ Because finding information can be done in so many
different ways, IR would involve:
• Classification
• Re-Clustering
• Recommendation
• Social network
• Interpreting natural languages
• Question answering
• Knowledge bases
• Human-computer Interaction
• Psychology and Cognitive Science

7
Narrow Sense of IR

• It is ‘search’
– Mostly searching for documents
• It is a computer science discipline that designs and implements
algorithms and tools to help people find information that they
want
– from one or multiple large collections of materials (text or
multimedia, structured or unstructured, with or without
hyperlinks, with or without metadata, in a foreign language
or not • where people can be a single user or a group
– who initiate the search process by an information need,
– and, the resulting information should be relevant to the
information need (based on the judgment by the person
who starts the search)

8
Narrow Sense of IR
• It helps people find relevant documents
– from one large collection of material (which is the Web or
a TREC collection),
– where there is a single user,
– who initiates the search process by a query driven by an
information need,
– and, the resulting documents should be ranked (from the
most relevant to the least) and returned in a list

9
Relationships to Sister Disciplines

10
Databases vs. IR

11
Data Retrieval VS Information Retrieval

▪ Information retrieval (IR) and data retrieval (DR) are two


related but distinct concepts in the field of data management.

▪ While both involve the search for specific data, they differ in
their scope and purpose.

▪ In this section we compare the a two concepts based on


definition, scope and purpose and so on.

12
Definition
Information Retrieval
▪ It is the process of retrieving relevant information from a
collection of unstructured or semi-structured data.
▪ It involves the use of search engines or other information
retrieval systems to find documents or other sources of
information that match a particular query.
Data Retrieval
▪ It is the process of retrieving specific data from a structured
database or other data storage system.
▪ It involves the use of queries or other data retrieval techniques
to extract the desired data from a larger data set.

13
Scope
Information Retrieval
▪ The scope of information retrieval is generally broader than
that of data retrieval.
▪ Information retrieval systems are designed to search large
collections of data, such as the internet or a digital library, and
return a set of relevant documents or other sources of
information.
Data Retrieval
▪ Data retrieval is the process of retrieving specific data from a
structured database or other data storage system.
▪ It involves the use of queries or other data retrieval techniques
to extract the desired data from a larger data set.

14
Purpose
▪ The purpose of Information Retrieval is to help users find
relevant information quickly and efficiently.
▪ It is often used in situations where the user is not sure exactly
what they are looking for, and needs to explore a large
collection of data to find relevant information.
▪ The purpose of Data Retrieval is to extract specific data
elements for analysis or processing.
▪ It is often used in business intelligence or data analysis
applications, where the user needs to extract specific data
elements from a larger data set for further analysis.

15
Data Retrieval VS Information
Retrieval
Data retrieval Information Retrieval
Data organization Structured Unstructured
Fields Clear Semantics (ID, No fields (other than
Name, age,) text)
Query Language Artificial (defined, SQL) Free text (“natural
language”), Boolean
Matching Exact (results are always Partial match, best
“correct”) match
Query specification Complete Incomplete
Query specification Complete Incomplete
Items wanted Matching Relevant
Accuracy 100% < 50%
16
Information Retrieval systems
▪ An Information Retrieval System is a system that is capable of
storage, retrieval, and maintenance of information.
▪ It consists of a software program that facilitates a user in
finding the information the he/she needs.
▪ Modern information retrieval systems deal not only with
textual information but also with multimedia information
comprising text, audio, images and video.
▪ They deal with storage, organization and access to text, as well
as multimedia information resources.

17
Objectives of Information
Retrieval Systems

▪ The general objective of an Information Retrieval System is to


minimize the overhead of a user locating needed information.
▪ Overhead can be expressed as the time a user spends in all of
the steps leading to reading an item containing the needed
information (e.g.,,
▪ Query generation
▪ Query execution
▪ Scanning results of query to select items to read,
▪ Reading non-relevant items

18
Scope of IR System
Unstructured Information
▪ This information either does not have a pre-defined data model
or is not organized in a pre-defined order.
▪ Unstructured information is typically text-heavy, but may
contain datasets such as dates, numbers, and facts as well.
▪ Examples of “unstructured data” may include :-
▪ books, ▪ Analog data,
▪ journals, ▪ Images,
▪ Files, and
▪ documents,
▪ Unstructured text such as the
▪ metadata, body of an e-mail message, Web
▪ health records, page, or word-processor
▪ audio, document.
▪ video,

19
Scope of IR System
Structured Information

▪ It is information that is already structured in fields, such as


“name”, “age”, “gender”, “hobby”, “address”, “profession”,
“salary”.
▪ Structured data first depends on creating a data model – a
model of the types of business data that will be recorded and
how they will be stored, processed and accessed.
▪ Structured data can be handled easily as they can be easily
entered, stored, queried and analyzed.

20
Structure of an IR System

▪ An Information Retrieval System serves as a bridge between


the world of authors and the world of readers/users.
▪ writers present a set of ideas in a document using a set of
concepts.
▪ Users seek the IR system for relevant documents that satisfy
their information need.

21
Structure of an IR System
▪ The black box is the information retrieval system.
▪ The notion of relevance is at the centre of IR.
▪ The primary goal of an IR system is to retrieve all the
documents which are relevant to a user query while retrieving
as few non-relevant documents as possible.

22
Typical IR System Architecture

23
IR System vs. Web Search System
Web Spider
Document
corpus

Query IR
String System

1. Page1
2. Page2
3. Page3 Ranked
. Relevant Documents
.

24
The Retrieval Process

25
The Retrieval Process
• It is necessary to define the text database before any of the
retrieval processes are initiated
• This is usually done by the manager of the database and includes
specifying the following
– The documents to be used
– The operations to be performed on the text
– The text model to be used (the text structure and what
elements can be retrieved)
• The text operations transform the original documents and the
information needs and generate a logical view of them

26
The Retrieval Process

▪ Once the logical view of the documents is defined,


the database module builds an index of the text.
▪ An index is a critical data structure
▪ It allows fast searching over large volumes of data
▪ Different index structures might be used, but the most
popular one is the inverted file.
▪ Given the document database is indexed, the
retrieval process can be initiated.

27
The Retrieval Process

▪ The user first specifies a user need which is then parsed


and transformed by the same text operation applied to
the text.
▪ Next the query operations is applied before the actual
query, which provides a system representation for the
user need, is generated.
▪ The query is then processed to retrieve documents
▪ Before the retrieved documents are sent to the user, the
retrieved documents are ranked according to the
likelihood of relevance.

28
The Retrieval Process

▪ The user then examines the set of ranked documents in


the search for useful information.
▪ Two choices for the user:
▪ Reformulate query, run on entire collection or
▪ Reformulate query, run on result set
▪ At this point, he might pinpoint a subset of the documents
seen as definitely of interest and initiate a user feedback
cycle
▪ In such a cycle, the system uses the documents selected
by the user to change the query formulation.
▪ Hopefully, this modified query is a better
representation of the real user need
29

You might also like