0% found this document useful (0 votes)

14 views18 pages

Chapter One IR

The document provides an overview of Information Storage and Retrieval (ISR) aimed at IT 3rd year students, covering topics such as the retrieval process, indexing structures, IR models, and evaluation metrics. It highlights the challenges of processing large collections of documents and the importance of relevance in information retrieval. Additionally, it discusses various types of IR systems and their architecture, emphasizing the need for effective query formulation and user feedback in the retrieval process.

Uploaded by

bekeletamirat931

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views18 pages

Chapter One IR

Uploaded by

bekeletamirat931

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Information Storage and Retrieval

Chapter One
Introduction to ISR
Target Group –IT 3rd year students

Injibara, Ethiopia
Course Outline
Topic(s) Details
Define IR; The retrieval process; Basic structure of an IR
Overview of IR
system
Text Document Basic Laws in IR; Tokenization; Stop word detection;
Operations Stemming; Normalization; Term weighting; similarity measures
Indexing
Structures The need for indexing; sequential file; Inverted files
A Formal Characterization of IR Models; Boolean model,
IR Models
Vector space model & Probabilistic model
Retrieval Evaluation of IR systems; Relevance judgement; Retrieval
Evaluation effectiveness measures (Recall, Precision, F-measure, etc.)
Types of Query formulation; Keyword-based queries (Boolean
Query Languages
queries); Pattern matching; Natural language queries
Current Issues in IR in Local Languages; Information Extraction; Information
IR Filtering; Text Summarization, Cross-language retrieval...
Text Collections and IR
• Information is organized into (a large number of)
documents
₋ Large collections of documents available from various sources:
books, magazines, newspapers, journal articles, conference
papers, digital libraries, Web pages, etc.

• Example: How Much Data?

– Google processes 20 Petabyte a day (2008)
– Google Web Search Engine claims to index over 30 trillion
pages(1995-2014)
 It performs more than 40 000 search queries each second on
average and over 5.2 billion searches per day in 2017 and 4
trillion per year world wide.
• Wayback Machine has 50PB used storage(2014)
• Facebook has 100 PB of user data (2012)
• eBay has 6.5 PB of user data + 50 TB/day (2009)
Storage of Text
• Textual Documents
– Searchable as text
– Words are represented as ASCII/Unicode
• Image Documents
– Scanned image of text document, which is not searchable as
text: Texts (characters, words, etc.) are represented as
patterns of pixels
– Retrieval from Document Images: Two options
• Recognition-based retrieval: OCR is required to convert
document images to ASCII (may be error prone) and then
apply text IR systems on the recognized documents
• Recognition-free retrieval: Retrieval from document images
without explicit recognition.
• Search relevant documents directly from image collections
The Problem of IR
• Need
– Increasing the size and number of published documents
– Traditional methods had difficulties in document processing
– Different disciplines(Biotechnology, Genetics..) producing
different types of huge amount data Info.
need

Query
IR
Retrieval system
Document Answer list
collection

• Goal
– Find documents relevant to an information need from a
large document set
What is Information Retrieval ?

• Information retrieval is the process of searching for relevant

documents from unstructured large corpus that satisfy
information need of users
– It is a tool that finds and selects from a collection of items a
subset that serves the user’s purpose

• Information retrieval (IR) is finding material (usually

documents) of an unstructured nature (usually text) that satisfies
an information need from within large collections (usually
stored on computers).
Examples of IR System
•Much IR systems focuses more specifically on text retrieval. But there are
many other IR areas:
–Cross-language retrieval, text summarization, information filtering,
Question-answering, content-based multimedia (audio, Image and Video)
retrieval
•Text-based (Lexis-Nexis, Google, FAST):
–Search by keywords.
–Limited search using queries in natural language.
•Multimedia (WebSeek, SaFe):
–(shapes, colors,… ).
•Question answering systems (AskJeeves, Answerbus):
–Search in (restricted) natural language
•Cross language vs. Multilingual Information Retrieval
Information Retrieval serve as Bridge
• An Information Retrieval System serves as a bridge
between the world of authors and the world of
readers/users
• That is, writers present a set of ideas in a document
using a set of concepts

• Then Users seek the IR system for relevant documents

that satisfy their information need

Black box
User Documents
Typical IR System Architecture

Document
corpus

Query IR
String System

1. Doc1
2. Doc2
Ranked 3. Doc3
Relevant Documents .
.
The Notion of Relevance
• Relevance is a subjective judgment and may include:
 Being timely (recent information)
 Being authoritative (from a trusted source)
 Satisfying the goals of the user and his/her intended use of the
information (information need)
• Relevance information is that suited to your
information need
• What is actually needed (relevant)
– Dependent on: (User, Space/time, Group and Context)

• IR is very concerned with relevance

IR System vs. Web Search System

Web Spider
Document
corpus

Query IR
String System

1. Page1
2. Page2
3. Page3 Ranked
. Relevant Documents
.
The Retrieval Process
User
Interface
User need
Text Text
Text Operations Database
L o g i c a l v i e w

User Query DocID

Indexing
feedback Formulation
Inverted
Query
file
Searching
Index
Retrieved file
docs
Ranked docs
Ranking
The Retrieval Process
• It is necessary to define the text database before any of
the retrieval processes are initiated
• The text operations transform the original documents & the
information needs and generate a logical view of them
• Once the logical view of the documents is defined, the
database module builds an index of the text
– An index is a critical data structure
– It allows fast searching over large volumes of data
The Retrieval Process
• Different index structures might be used, but the most popular
one is the inverted file (more on this later) as indicated in the
slide.

• Given the document database is indexed, the retrieval

process can be initiated.

• The user first specifies a user need which is then parsed &
transformed by the same text operation applied to the text.

– Next the query operations is applied before the actual query,

which provides a system representation for the user need, is
generated.
The Retrieval Process
• The query is then processed to retrieve documents.
– Before the retrieved documents are sent to the user, the retrieved
documents are ranked according to the likelihood of relevance.

• The user then examines the set of ranked documents in the

search for useful information

• At this point, the user might pinpoint a subset of the

documents seen as definitely of interest & initiate a user
feedback cycle
– In such a cycle, the system uses the documents selected by the
user to change the query formulation

– Hopefully, this modified query is a better representation of the

real user need
Issues that arise in IR
1. Text document representation
– What makes a “good” representation?
– How is a representation generated from text?
– What are the retrievable objects & how are they organized?

2. Information need representation

– What is an appropriate query language?
– How can interactive query formulation & refinement be supported?

3. Comparing representations
– What is a “good” similarity measure & retrieval model?
– How is uncertainty represented?

4. Evaluating effectiveness of retrieval

– What are good metrics?
– What constitutes a good experimental test bed?
Students’ Reflection:
What are the main components in Information
Retrieval System?
a) ____________________________________
b) ____________________________________
c) ____________________________________

What are the main differences between Information

Retrieval System and Database Management System?
a) ____________________________________
b) ____________________________________
c) ____________________________________
17
1 18

Chapter 4
No ratings yet
Chapter 4
37 pages
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Chapter 1 Event
No ratings yet
Chapter 1 Event
39 pages
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Saep 381
91% (33)
Saep 381
17 pages
Artificial Intelligence Ass
No ratings yet
Artificial Intelligence Ass
33 pages
Mobile App Chapter 2
No ratings yet
Mobile App Chapter 2
44 pages
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter Two IR
No ratings yet
Chapter Two IR
44 pages
Chapter 2
No ratings yet
Chapter 2
24 pages
SUpervised Result in Graphy
No ratings yet
SUpervised Result in Graphy
1 page
Chapter Four
No ratings yet
Chapter Four
49 pages
Chapter 3
No ratings yet
Chapter 3
34 pages
Network Design, Configuration-IP Assignment
No ratings yet
Network Design, Configuration-IP Assignment
58 pages
DS SEM 8 Curriculum
No ratings yet
DS SEM 8 Curriculum
3 pages
Wube Lab Report
No ratings yet
Wube Lab Report
21 pages
UNIT I - Introduction and Motivation
No ratings yet
UNIT I - Introduction and Motivation
57 pages
1 introIR
No ratings yet
1 introIR
15 pages
SEO Trends 2024
No ratings yet
SEO Trends 2024
10 pages
Azure Cognitive Search: Outperforming Vector Search With Hybrid Retrieval and Ranking Capabilities
No ratings yet
Azure Cognitive Search: Outperforming Vector Search With Hybrid Retrieval and Ranking Capabilities
1 page
Privacy Priserving Multi - Keyword Ranked Search Over Encrypted Cloud Data
No ratings yet
Privacy Priserving Multi - Keyword Ranked Search Over Encrypted Cloud Data
66 pages
Information Retrieval Systems
No ratings yet
Information Retrieval Systems
46 pages
ch1 - Information Retrieval Systems
No ratings yet
ch1 - Information Retrieval Systems
52 pages
Applications of NLP: Introduction To Natural Language Processing (CSE 5321)
No ratings yet
Applications of NLP: Introduction To Natural Language Processing (CSE 5321)
59 pages
Working With The Internet
No ratings yet
Working With The Internet
39 pages
1 Introduction MIR
No ratings yet
1 Introduction MIR
35 pages
Keyword Research Course
No ratings yet
Keyword Research Course
59 pages
Ccmatrix: Mining Billions of High-Quality Parallel Sentences On The Web
No ratings yet
Ccmatrix: Mining Billions of High-Quality Parallel Sentences On The Web
13 pages
1-Overview of Information Retrieval
No ratings yet
1-Overview of Information Retrieval
44 pages
Title of Your Article in Arabic (Tranditional Arabic 16 Bold and With Single Space)
No ratings yet
Title of Your Article in Arabic (Tranditional Arabic 16 Bold and With Single Space)
4 pages
1-Overview of Information Retrieval - New
No ratings yet
1-Overview of Information Retrieval - New
47 pages
Ch2 - IR and LT
No ratings yet
Ch2 - IR and LT
45 pages
Hubspot Manual
No ratings yet
Hubspot Manual
59 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
34 pages
JSitemap Professional Documentation
No ratings yet
JSitemap Professional Documentation
72 pages
Intro Notes
No ratings yet
Intro Notes
11 pages
DIAdem Manul - 373082m
No ratings yet
DIAdem Manul - 373082m
98 pages
CLR Collection Development Policy
100% (1)
CLR Collection Development Policy
15 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
59 pages
Chapter 1 Ir
No ratings yet
Chapter 1 Ir
37 pages
Chapter 1 Introduction To IR
No ratings yet
Chapter 1 Introduction To IR
18 pages
ISR Chap..1
No ratings yet
ISR Chap..1
27 pages
Concepts of Information Retrieval System
No ratings yet
Concepts of Information Retrieval System
10 pages
IR Chapter 1
No ratings yet
IR Chapter 1
29 pages
RetrivalChapter One
No ratings yet
RetrivalChapter One
30 pages
Advanced Search Engine Strategies
No ratings yet
Advanced Search Engine Strategies
66 pages
Lecture1 Chap1
No ratings yet
Lecture1 Chap1
22 pages
1 IR Introductionn
No ratings yet
1 IR Introductionn
30 pages
What Is Structured Data?: Information Retrieval
No ratings yet
What Is Structured Data?: Information Retrieval
6 pages
Chapter 1 Introduction To IR
No ratings yet
Chapter 1 Introduction To IR
18 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
63 pages
1.introduction Information Retrival
No ratings yet
1.introduction Information Retrival
31 pages
Ch1 IR
No ratings yet
Ch1 IR
39 pages
IR Introduction
100% (1)
IR Introduction
6 pages
Advanced SEO Interview Questions and Answers
0% (1)
Advanced SEO Interview Questions and Answers
41 pages
Retrieval Tools: Topic 4
No ratings yet
Retrieval Tools: Topic 4
69 pages
Chap 1
No ratings yet
Chap 1
23 pages
5.1 Applications of Data Mining: Unit V - Data Warehousing and Data Mining - Ca5010 1
No ratings yet
5.1 Applications of Data Mining: Unit V - Data Warehousing and Data Mining - Ca5010 1
16 pages
ICT - Minimum Learning Competencies - Grade 9 and 10
No ratings yet
ICT - Minimum Learning Competencies - Grade 9 and 10
8 pages
Tamil Search Engine - Tamil
No ratings yet
Tamil Search Engine - Tamil
6 pages
1 introIR
No ratings yet
1 introIR
22 pages
DDB Ch27
No ratings yet
DDB Ch27
60 pages
Splunk Quick Reference Guide
No ratings yet
Splunk Quick Reference Guide
6 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
Administrating The TREX Queue Server
100% (1)
Administrating The TREX Queue Server
19 pages
Unit - I - IR
No ratings yet
Unit - I - IR
39 pages
1 IR Introduction
No ratings yet
1 IR Introduction
23 pages
Word Cruncher
No ratings yet
Word Cruncher
3 pages
1 IR Intro
No ratings yet
1 IR Intro
30 pages
Information Retrieval Detailed Lecture Nov 2023
No ratings yet
Information Retrieval Detailed Lecture Nov 2023
39 pages
1stunit GN
No ratings yet
1stunit GN
36 pages
1-Overview of Information Retrieval
No ratings yet
1-Overview of Information Retrieval
44 pages
Unit1 Introduction
No ratings yet
Unit1 Introduction
31 pages
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
No ratings yet
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
77 pages
Purchase Order Processing
No ratings yet
Purchase Order Processing
260 pages
1 IRIntro
No ratings yet
1 IRIntro
95 pages
IRS B Tech CSE Part 1
No ratings yet
IRS B Tech CSE Part 1
161 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Information Storage and Retrieval - 783
100% (1)
Information Storage and Retrieval - 783
12 pages
MSC IR 2021
100% (1)
MSC IR 2021
188 pages
1 IR Chapter-One
No ratings yet
1 IR Chapter-One
47 pages
Searching Document in Google
No ratings yet
Searching Document in Google
23 pages
University of Gondar: Information Storage and Retrieval System
100% (1)
University of Gondar: Information Storage and Retrieval System
29 pages
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
100% (1)
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
35 pages
Chapter 1 Introduction To ISR
No ratings yet
Chapter 1 Introduction To ISR
39 pages
IR Chapter 1&2
No ratings yet
IR Chapter 1&2
88 pages
Information Retrieval: Dr. Bassel ALKHATIB
No ratings yet
Information Retrieval: Dr. Bassel ALKHATIB
55 pages
Unit 1: Introduction and Data Pre-Processing
No ratings yet
Unit 1: Introduction and Data Pre-Processing
71 pages
Information Storage and Retrieval: Chapter One - Introduction
No ratings yet
Information Storage and Retrieval: Chapter One - Introduction
50 pages
Introduction To IR Chapter 01
No ratings yet
Introduction To IR Chapter 01
29 pages
Information Retrieval: DR Sharifullah Khan Nust Seecs
No ratings yet
Information Retrieval: DR Sharifullah Khan Nust Seecs
32 pages
Information Retrieval 1 Introduction To IR
No ratings yet
Information Retrieval 1 Introduction To IR
12 pages
Chapter One ISR
No ratings yet
Chapter One ISR
25 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
Part I IR VTU M Tech SSE
No ratings yet
Part I IR VTU M Tech SSE
72 pages
Information Retrieval and Web Search
No ratings yet
Information Retrieval and Web Search
29 pages
Documentation Ir
No ratings yet
Documentation Ir
58 pages
Information Retrieval 1
100% (2)
Information Retrieval 1
12 pages

Chapter One IR

Uploaded by

Chapter One IR

Uploaded by

Information Storage and Retrieval

• Example: How Much Data?

• Information retrieval is the process of searching for relevant

• Information retrieval (IR) is finding material (usually

• Then Users seek the IR system for relevant documents

• IR is very concerned with relevance

User Query DocID

• Given the document database is indexed, the retrieval

– Next the query operations is applied before the actual query,

• The user then examines the set of ranked documents in the

• At this point, the user might pinpoint a subset of the

– Hopefully, this modified query is a better representation of the

2. Information need representation

4. Evaluating effectiveness of retrieval

What are the main differences between Information

You might also like