01 Introduction To ISR
01 Introduction To ISR
Chapter One:
Introduction to ISR
Information Retrieval Systems?
Document (Web page)
retrieval in response to a
query
Quite effective (at some
things)
Commercially successful
(some of them)
But what goes on behind
the scenes?
How do they work? Web search
What happens beyond
systems
• Lycos, Excite,
the Web?
Yahoo, Google, Live,
Northern Light,
Teoma, HotBot,
Examples of IR systems
Conventional (library catalog): by keyword, title, author,
etc.
E.g.: You are probably familiar with www.library.unt.edu
AAU library catalog
Retrieval
DB
Browsing
USER
• Retrieval The User Task
• It is the process of retrieving information
whereby the main objective is clearly
defined from the onset of searching
process
• The user of a retrieval system has to
translate his information need into a
query in the language provided by the
system
• In this context (i.e. by specifying a set of
words), the user searches for useful
information executing a retrieval task
• English Language Statement :
I want a book by J. K Rowling titled The Chamber of Secrets
• Browsing The User Task
• It is the process of retrieving information,
whereby the main objective is not
clearly defined from the beginning and
whose purpose might change during the
interaction with the system
Document
corpus
Query IR
String System
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
Web Search System (e.g.: Google)
Web crawler
Web Spider
Document
corpus
Query IR
String System
Ranked
Documents
1. Page1
2. Page2
3. Page3
.
What is Information Retrieval ?
Information retrieval a broad area of computer
science focusing on easy access of information, as
defined in Baeze-Yates & Riberio-Neto (2011, p1)
“Information retrieval deals with representation,
storage, organization of, and access to
information items, such as documents, Web pages,
online catalogs, structured and semi-structured
records, multimedia objects.The organization and
access of information items should provide the user
with easy access to the information in which he is
interested”
The definition incorporates all important features of
a good information retrieval system
Representation
Storage
Organization
Access
The focus is on the user information need
Overview of the Retrieval process
Overview of the Retrieval process (2)
The Retrieval Process
It is necessary to define the text collection
before any of the retrieval processes are
initiated
This is usually done by the manager of the
database and includes specifying the following
The documents to be used
The operations to be performed on the text
The text model to be used (the text structure and
what elements can be retrieved)
Text Operations
logical view Logical view
DB
User Query Language manager
feedback Indexing Module
& Operations
Searching Index
documents
Documents Assign document identifier
text document
Tokenize
IDs
tokens
Stop list
non-stoplist Stemming & Normalize
tokens
stemmed Term weighting
terms
terms with
weights Index
Searching Subsystem
query parse query
query tokens
ranked
Stop list non-stoplist
document
tokens
set
ranking
Stemming & Normalize
relevant stemmed terms
document set
Similarity Query Term weighting
Measure terms
Index terms
Index
End of Chapter One