Chapter One IR
Chapter One IR
Chapter One
Introduction to ISR
Target Group –IT 3rd year students
Injibara, Ethiopia
Course Outline
Topic(s) Details
Define IR; The retrieval process; Basic structure of an IR
Overview of IR
system
Text Document Basic Laws in IR; Tokenization; Stop word detection;
Operations Stemming; Normalization; Term weighting; similarity measures
Indexing
Structures The need for indexing; sequential file; Inverted files
A Formal Characterization of IR Models; Boolean model,
IR Models
Vector space model & Probabilistic model
Retrieval Evaluation of IR systems; Relevance judgement; Retrieval
Evaluation effectiveness measures (Recall, Precision, F-measure, etc.)
Types of Query formulation; Keyword-based queries (Boolean
Query Languages
queries); Pattern matching; Natural language queries
Current Issues in IR in Local Languages; Information Extraction; Information
IR Filtering; Text Summarization, Cross-language retrieval...
Text Collections and IR
• Information is organized into (a large number of)
documents
₋ Large collections of documents available from various sources:
books, magazines, newspapers, journal articles, conference
papers, digital libraries, Web pages, etc.
Query
IR
Retrieval system
Document Answer list
collection
• Goal
– Find documents relevant to an information need from a
large document set
What is Information Retrieval ?
Black box
User Documents
Typical IR System Architecture
Document
corpus
Query IR
String System
1. Doc1
2. Doc2
Ranked 3. Doc3
Relevant Documents .
.
The Notion of Relevance
• Relevance is a subjective judgment and may include:
Being timely (recent information)
Being authoritative (from a trusted source)
Satisfying the goals of the user and his/her intended use of the
information (information need)
• Relevance information is that suited to your
information need
• What is actually needed (relevant)
– Dependent on: (User, Space/time, Group and Context)
Web Spider
Document
corpus
Query IR
String System
1. Page1
2. Page2
3. Page3 Ranked
. Relevant Documents
.
The Retrieval Process
User
Interface
User need
Text Text
Text Operations Database
L o g i c a l v i e w
• The user first specifies a user need which is then parsed &
transformed by the same text operation applied to the text.
3. Comparing representations
– What is a “good” similarity measure & retrieval model?
– How is uncertainty represented?