Information Retrieval: Prof: Ehab Ezzat Hassanein

The document outlines the process of constructing and querying inverted indexes in information retrieval systems. Key stages include tokenization, normalization, stemming, and handling stop words, followed by indexing steps that involve sorting and creating a dictionary and postings. Additionally, it discusses efficient storage requirements and query processing techniques, particularly for AND queries.

Uploaded by

yahia mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views14 pages

Information Retrieval: Prof: Ehab Ezzat Hassanein

Uploaded by

yahia mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Information Retrieval

Prof: Ehab Ezzat Hassanein

1 / 14
Constructing And Querying Inverted
Indexes

2 / 14
Inverted Index construction

3 / 14
Initial stages of text processing
●
Tokenization
– Cut character sequence into words tokens
Deal with “John’s”, a state-of-the-art solution
●

●
Normalization
– Map text and query term to the same form
USA and U.S.A to match
●

●
Stemming
– We may wish different forms of a root to match
authorize and authorization
●

●
Stop words
– We may omit very common words (or not!)
●
The, a, to, of 4 / 14
– Query the song to be or not to be!!
Indexer Steps:
Token Sequence

The Sequence of (modified tokens,

document ID) pairs

5 / 14
Indexer Steps:
Sort

6 / 14
Indexer Steps:
Dictionary And
Postings
●
Multiple term entries in a
single document are
merged
●
Split into Dictionary and
Postings
●
Doc Frequency
information is added

7 / 14
Where do we pay in
Storage?

●
Terms ~ 500 K
●
Pointer ~ 500 K
●
Posting list are bounded by the
number of terms so in our
example 1M documnts * 1000
average words pr document
==>> less than 1 billion item 8 / 14
Efficient IR System Implementation

●
How do we index efficiently?
●
How much storage do we need.

9 / 14
Query Processing with an Inverted
Index

10 / 14
Query Processing: AND
●
Consider Processing query:
Brutus and Caesar
●
1. Locate Brutus in the Dictionary
●
2. Retrieve its postings
●
3. Locate Caesar in the Dictionary
●
4. Retrieve its postings
●
5. Merge the two postings lists (intersect the document sets):

11 / 14
Algorithm for the merging of two
postings lists

2, 8

12 / 14
Algorithm for the merging of two
postings lists

13 / 14
14 / 14

IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
Lecture 2 - Boolean Retrieval
No ratings yet
Lecture 2 - Boolean Retrieval
49 pages
Lecture1 Intro
No ratings yet
Lecture1 Intro
60 pages
Lecture1 Intro
No ratings yet
Lecture1 Intro
57 pages
Week 6
No ratings yet
Week 6
98 pages
3 Indexing
No ratings yet
3 Indexing
28 pages
IR Summary Lec 1 - Introduction
No ratings yet
IR Summary Lec 1 - Introduction
54 pages
Lec 2
No ratings yet
Lec 2
17 pages
Information Retrieval: Prof: Ehab Ezzat Hassanein
No ratings yet
Information Retrieval: Prof: Ehab Ezzat Hassanein
16 pages
Lec 3
No ratings yet
Lec 3
17 pages
chapter2-MA212-Indexing & Preprocessing
No ratings yet
chapter2-MA212-Indexing & Preprocessing
68 pages
IRS Lec06 24
No ratings yet
IRS Lec06 24
13 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
7 Phrase Queries and Positional Indexes
No ratings yet
7 Phrase Queries and Positional Indexes
25 pages
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
No ratings yet
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
Lecture1-Intro - Realted To Ch1
No ratings yet
Lecture1-Intro - Realted To Ch1
60 pages
2-Boolean IR and Indexing
No ratings yet
2-Boolean IR and Indexing
46 pages
Document Indexing in Information Retrieval
No ratings yet
Document Indexing in Information Retrieval
19 pages
Introduction To Indexing Structure and Designing An Information Retrieval
No ratings yet
Introduction To Indexing Structure and Designing An Information Retrieval
22 pages
ch3 - Indexing - 2019
No ratings yet
ch3 - Indexing - 2019
38 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
54 pages
Ir 1
No ratings yet
Ir 1
14 pages
Unit I
No ratings yet
Unit I
83 pages
03 - Lect3 Search Engines-Part2
No ratings yet
03 - Lect3 Search Engines-Part2
32 pages
2.boolean Retrieval Model
No ratings yet
2.boolean Retrieval Model
40 pages
Lecture1 Intro Handout 1 Per
No ratings yet
Lecture1 Intro Handout 1 Per
57 pages
Module 1-1
No ratings yet
Module 1-1
12 pages
Lect 3 Inverted Index
No ratings yet
Lect 3 Inverted Index
24 pages
Unit 1
No ratings yet
Unit 1
181 pages
C1 Intro
No ratings yet
C1 Intro
10 pages
IR-Lec1 - Ch1-2023
No ratings yet
IR-Lec1 - Ch1-2023
41 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
51 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
Lecture2 Intro Boolean 6per
No ratings yet
Lecture2 Intro Boolean 6per
9 pages
03lecture 3 - Biomedical IR-indexing
No ratings yet
03lecture 3 - Biomedical IR-indexing
27 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
69 pages
Ir
No ratings yet
Ir
4 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
46 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
IR Lec04 Skip Ptrs Phrase Queries Indexing
No ratings yet
IR Lec04 Skip Ptrs Phrase Queries Indexing
18 pages
Lecture 2 Inverted Index PDF
No ratings yet
Lecture 2 Inverted Index PDF
24 pages
Lec 1 IR
No ratings yet
Lec 1 IR
42 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
Lecture01 Intro
No ratings yet
Lecture01 Intro
45 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
30 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
31 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
38 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
42 pages
IR Unit 2
No ratings yet
IR Unit 2
54 pages
600 Computer Mcqs
No ratings yet
600 Computer Mcqs
23 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
57 pages
Information Retrival Systems
No ratings yet
Information Retrival Systems
50 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
Lecture 3-Skip Pointers and Phrase Queries
No ratings yet
Lecture 3-Skip Pointers and Phrase Queries
12 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Disambiguation of Particles: Hindi-To-English
From Everand
Disambiguation of Particles: Hindi-To-English
Anil Thakur
No ratings yet
Perl Programming: Scripting and Automation
From Everand
Perl Programming: Scripting and Automation
William Smith
No ratings yet
Genitive in Hindi
From Everand
Genitive in Hindi
Anil Thakur
No ratings yet
Mastering Natural Language Processing with Python and NLTK
From Everand
Mastering Natural Language Processing with Python and NLTK
Pedro Martins
No ratings yet

Information Retrieval: Prof: Ehab Ezzat Hassanein

Uploaded by

Information Retrieval: Prof: Ehab Ezzat Hassanein

Uploaded by

Information Retrieval

Prof: Ehab Ezzat Hassanein

The Sequence of (modified tokens,

You might also like