Chapter 1 Introduction To ISR
Chapter 1 Introduction To ISR
Evaluations:
Assessment 1 15%
Assessment 2 15%
Final 40%
Project 20%
Quizzes 10%
Introduction to
Information Storage and Retrieval
Retrieval
DB
Browsing
USER
• Retrieval The User Task
• It is the process of retrieving information whereby
the main objective is clearly defined from the
onset of searching process.
• The user of a retrieval system has to translate his
information need into a query in the language
provided by the system.
• In this context (i.e. by specifying a set of words),
the user searches for useful information executing
a retrieval task
• English Language Statement :
I want a book by J. K Rowling titled The Chamber of Secrets
• Browsing The User Task
• It is the process of retrieving information,
whereby the main objective is not clearly defined
from the beginning and whose purpose might
change during the interaction with the system.
Given:
A collection of textual natural-language documents
A user query in the form of a textual string
Process:
A ranked set of documents that are assumed to be
relevant to the user query
Measure of Effectiveness:
Number of relevant docs from the retrieved collection
Number of relevant docs retrieved from the whole
collection
Measure of Accuracy
Typical IR System Architecture
Document
corpus
Query IR
String System
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
Web Search System (e.g.: Google)
Web crawler
Web Spider
Document
corpus
Query IR
String System
Ranked
Documents
1. Page1
2. Page2
3. Page3
.
What is Information Retrieval ?
A good formal definition of information retrieval is
given in Baeze-Yates & Riberio-Neto (1990, p1)
“Information retrieval deals with representation,
storage, organization of, and access to information
items. The organization and access of information items
should provide the user with easy access to the
information in which he is interested”
The definition incorporates all important features of a
good information retrieval system
Representation
Storage
Organization
Access
The focus is on the user information need
Overview of the Retrieval process
Overview of the Retrieval process (2)
The Retrieval Process
It is necessary to define the text collection before any
of the retrieval processes are initiated
This is usually done by the manager of the database and
includes specifying the following
The documents to be used
The operations to be performed on the text
The text model to be used (the text structure and what elements
can be retrieved)
Text Operations
logical view Logical view
DB
User Query Language manager
Indexing Module
feedback & Operations
Searching Index
Comparing representations
to identify relevant documents
What weighting scheme and similarity measure to be used?
what is a “good” model of retrieval?
documents
Documents Assign document identifier