0% found this document useful (0 votes)

549 views5 pages

Solution.: Increase - 3

The question asks to calculate the precision and recall of an IR system given the following information: 1) The system returned 3 relevant documents 2) It also returned 2 irrelevant documents 3) There are a total of 8 relevant documents in the collection The precision is 3/5 = 0.6 since there were 3 true positives out of the 5 documents returned. The recall is 3/8 = 0.375 since it returned 3 of the 8 relevant documents.

Uploaded by

Ehab Emam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

549 views5 pages

Solution.: Increase - 3

Uploaded by

Ehab Emam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Q1.

Draw the inverted index that would be built for the following document collection

Doc 1 new home sales top forecasts

Doc 2 home sales rise in july

Doc 3 increase in home sales in july

Doc 4 july new home sales rise

SOLUTION. Inverted Index: forecast->1 home->1->2->3->4 in->2->3 increase->3 july->2->3 new->1->4 rise->2->4 sale->1->2-
>3->4 top->1

Q2. Consider these documents:

Doc 1 breakthrough drug for schizophrenia
Doc 2 new schizophrenia drug
Doc 3 new approach for treatment of schizophrenia
Doc 4 new hopes for schizophrenia patients

a. Draw the term-document incidence matrix for this document collection.

b. Draw the inverted index representation for this collection, as in Figure 1.3 (page 7).

SOLUTION.
Term-Document matrix: dl d2 d3 d4 Approach 0 01 0 breakthrough 1 0 0 0
d r u g 1 1 0 0 f o r 1 0 1 1 h o p e s 0 0 0 1 n e w 0 1 1 1 o f 0 0 1 0 p a t i e n t s 0 0 0 1 schizophrenia 1 1 1 1 treatment 0 0 1 0

inverted Index: Approach -> 3 breakthrough ->1 drug ->i->2 for ->1->3>4 hopes ->4 new -.>2->3->4 of ->3 patients ->4
schizophrenia ->1->2->3->4 treatment >3

Q3 For the document collection shown in Exercise 1.2, what are the returned
results for these queries: a. schizophrenia AND drug b. for AND NOT(drug OR
approach)
SOLUTION.
(i) docl, doc2 (ii) doc4

Q4. Recommend a query processing order for

(tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes) given the following postings list sizes:

Term Postings size

eyes 213312

kaleidoscope 87009

marmalade 107913
skies 271658

tangerine 46653

trees 316812
SOLUTION. Using the conservative estimate of the length of unioned postings lists, the recommended order is: (kaleidoscope OR eyes)
(300,321) AND (tangerine OR trees) (363,465) AND (marmalade OR skies) (379,571) However, depending on the actual distribution of
postings, (tangerine OR trees) may well be longer than (marmalade OR skies) because the two components of the former are more asymmetric.
For example, the union of 11 and 9990 is expected to be longer than the union of 5000 and 5000 even though the conservative estimate
predicts otherwise.

S. Singh's solution

1.71'ime for processing : (i) (tangerine OR trees) = 0(46653+316812) = 0(363465) (ii) (marmalade OR skies) = 0(107913+271658) = 0(379571)
(iii) (kaleidoscope OR eyes) = 0(46653+87009) = 0(300321)

Order of processing: a. Process (i), (ii), (iii) in any order as first 3 steps (total time for these steps is 0(363465+379571+300321) in any case)

b. Merge (i) AND (iii) = (iv): In case of AND operator, the complexity of merging postings list depends on the length of the shorter
postings list. Therefore, the more short the smaller postings list, the lesser the time spent. The reason for choosing (i) instead of (ii) is that the
output list (iv) is more probable to be shorter if (i) is chosen.
c. Merge (iv) AND (ii): This is the only merging operation left.

Q5. Are the following statements true or false?

a. In a Boolean retrieval system, stemming never lowers precision

b. In a Boolean retrieval system, stemming never lowers recall.

c. Stemming increases the size of the vocabulary.

d.Stemming should be invoked at indexing time but not while processing a query.

SOLUTION. a. False. Stemming can increase the retrieved set without increasing the number of relevant docuemnts, b.
True. Stemming can only increase the retrieved set, which means increased or unchanged recall. c. False. Stemming
decreases the size of the vocabulary. d. False. The same processing should be applied to documents and queries to ensure
matching terms.

Q6. We have a two-word query. For one term the postings list consists of the following 16 entries:

[4,6,10,12,14,16,18,20,22,32,47,81,120,122,157,180] and for the other it is the one entry postings

list: [47].

Work out how many comparisons would be done to intersect the two postings lists with the following two strategies. Briefly
justify your answers:

a. Using standard postings lists

b.Using postings lists stored with skip pointers, with a skip length of VT', as suggested in Section 2.3.
SOLUTION.
Applying MERGE on the standard postings list, comparisons will be made unless either of the postings list end i.e. till we reach
47 in the upper postings list, after which the lower list ends and no more processing needs to be done. Number of comparisons = 11

b. Using skip pointers of length 4 for the longer list and of length 1 for the shorter list, the following comparisons will be made:
1. 4 & 47 2. 14 & 47 3. 22 Sr 47 4. 120 & 47 5. 81 & 47 6. 47 & 47 Number of comparisons =6
Q7. Consider a postings intersection between this postings list, with skip pointers:

Trace through the Postings lists intersection with skip pointers.

a. How often is a skip pointer followed?

b. How many postings comparisons will be made by this algorithm while intersect ing the two lists? Identify them.
c. How many postings comparisons would be made if the postings lists are inter sected without the use of skip pointers?

SOLUTION.
a. The skip pointer is followed once. (from 24 to 75).
b. 19 co m p a r i s on s a r e m a d e . ( L e t ( x, y ) de no t e a p os t in g co m p a r i son. The comparisons are:(3,3),(5,5),(9,89),(15,89),
(24,89),(73,89),(75,89),(92,89),(81,89),(84,89),(89,89),(92,95),(115,95),(96,95),(96,97),(97,9),(100,99),(100 c. 19 ,
1
0

Q8. Shown below is a portion of a positional index in the format: term: doc1: (positions, position2, ); doc2: (positionl, position2, );
etc.

angels: 2: (36,174,252,651); 4: (12,22,102,432); 7: (17);

fools: 2: (1,17,74,222); 4: (8,78,108,458); 7: (3,13,23,193);

fear: 2: (87,704,722,901); 4: (13,43,113,433); 7: (18,328,528);

in: 2: (3,37,76,444,851); 4: (10,20,110,470,500); 7: (5,15,25,195);

rush: 2: (2,66,194,321,702); 4: (9,69,149,429,569); 7: (4,14,404);

to: 2: (47,86,234,999); 4:(14,24,774,944) 7: (199,319,599,709); tread: 2: (57,94,333);

4: (15,35,155); 7: (20,320);

where: 2: (67,124,393,1001); 4: (11,41,101,421,431); 7: (16,36,736);

Which document(s) if any meet each of the following queries, where each expression within quotes is a
phrase query?

a."fools rush in"

b."fools rush in" AND "angels fear to tread"

SOLUTION. Answer (a): All three documents (2, 4, and 7) satisfy the query. Answer (b):
Only document 4.

Q9. Write down the entries in the permuterm index dictionary that are generated by the term
mama.

SOLUTION.
marna$,ama$m,ma$ma,a$mam,$mama.

Q10. If you wanted to search for s*ng in a permuterm wildcard index, what key(s) would one do the lookup on?
SOLUTION. ng$s*

Q11. Compute the edit distance between paris and alice.

SOLUTION.
a 1 i c e

0 1 1 2 2 3 3 4 4 5 5
P 1 1 2 2 3 3 4 4 5 5 6
11 2 1 2 2 3 3 4 4 5 5
a 2 1 2 2 3 3 4 4 5 5 6
2 3 1 2 2 3 3 4 4 5 5
r 3 3 2 2 3 3 4 4 5 5 6
3 4 2 3 2 3 3 4 4 5 5
4 4 3 3 3 2 4 4 5 5 6
1
4 5 3 4 3 4 2 3 3 4 4
s 5 5 4 4 4 4 3 3 4 4 5
5 6 4 5 4 5 3 4 3 4 4

Q12. Starting from the following documents collection, build the documents-terms incidence
matrix as required by the Boolean model

d1 = “Big cats are nice and funny”

d2 = “Small dogs are better than big dogs”

d3 = “Small cats are afraid of small dogs”

d4 = “Big cats are not afraid of small dogs”

d5 = “Funny cats are not afraid of small dogs”

Q13. An IR system returns 3 relevant documents, and 2 irrelevant documents. There are a total of
8 relevant documents in the collection. What is the precision of the system on this search, and
what is its recall?

The precision is given by tp/(tp+fp) = 3/5

The recall is given by tp/(tp+fn) = 3/8

HGS-HSM-SL-21-001 - Improvement of Safety Function For DF Engine
No ratings yet
HGS-HSM-SL-21-001 - Improvement of Safety Function For DF Engine
6 pages
Information Retrieval Solutions Manual
84% (57)
Information Retrieval Solutions Manual
17 pages
Associate Cloud Engineer
No ratings yet
Associate Cloud Engineer
6 pages
1.1 Project Summary:: Digital Scrapbook
No ratings yet
1.1 Project Summary:: Digital Scrapbook
30 pages
A Linguagem Da Paz Num Mundo de Conflitos
No ratings yet
A Linguagem Da Paz Num Mundo de Conflitos
181 pages
Introduction To Information Rertrieval Answer
100% (4)
Introduction To Information Rertrieval Answer
6 pages
Ir MCQ-1
No ratings yet
Ir MCQ-1
22 pages
Assignments 1 Solution
100% (1)
Assignments 1 Solution
6 pages
Sheet 1
No ratings yet
Sheet 1
2 pages
Cse0002_ Oss Mid Term (Set 4)
100% (1)
Cse0002_ Oss Mid Term (Set 4)
11 pages
Spectrum Release Notes
No ratings yet
Spectrum Release Notes
11 pages
Sheet 2
No ratings yet
Sheet 2
4 pages
Artificial Intelligence PU Question Solutions 2019 DBK
No ratings yet
Artificial Intelligence PU Question Solutions 2019 DBK
65 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
QP Midsem Regular - Solutions For IR
100% (2)
QP Midsem Regular - Solutions For IR
4 pages
Mphasis Verbal Ability Old Placement Paper
No ratings yet
Mphasis Verbal Ability Old Placement Paper
18 pages
NLP Midsem Paper Jan 2024 Regular Exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular Exam
4 pages
Syllabus
No ratings yet
Syllabus
9 pages
VoLTE Optimization - Session 1
No ratings yet
VoLTE Optimization - Session 1
39 pages
InsightVM Slide Deck
No ratings yet
InsightVM Slide Deck
169 pages
Case Analysis
0% (1)
Case Analysis
4 pages
Iecex Sir 12.0100X
No ratings yet
Iecex Sir 12.0100X
17 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
Foundry Certification Guide - Solution Architect
No ratings yet
Foundry Certification Guide - Solution Architect
6 pages
Sheet 2 ch2
No ratings yet
Sheet 2 ch2
4 pages
Tcs Theory Notes by Kamal Sir
No ratings yet
Tcs Theory Notes by Kamal Sir
24 pages
IR Solutions Combined
No ratings yet
IR Solutions Combined
82 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Fiori Front Server 4.0 Implementation Guide
No ratings yet
Fiori Front Server 4.0 Implementation Guide
20 pages
Sp09midterm Revised
No ratings yet
Sp09midterm Revised
6 pages
Chapter 1 Introduction To ISR
No ratings yet
Chapter 1 Introduction To ISR
39 pages
CS 704 Advanced Computer Architecture
No ratings yet
CS 704 Advanced Computer Architecture
44 pages
Unit 1 - Modern Information Retrieval - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Modern Information Retrieval - WWW - Rgpvnotes.in
8 pages
Solved-Midterm Sistem Temu Kembali
No ratings yet
Solved-Midterm Sistem Temu Kembali
5 pages
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
No ratings yet
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
Part I IR VTU M Tech SSE
No ratings yet
Part I IR VTU M Tech SSE
72 pages
Unit Testing Tutorial: What Is, Types, Tools & Test Example
100% (1)
Unit Testing Tutorial: What Is, Types, Tools & Test Example
7 pages
Irs Unit Ii Part 1
No ratings yet
Irs Unit Ii Part 1
16 pages
Algorithms For Information Retrieval: Index Construction
No ratings yet
Algorithms For Information Retrieval: Index Construction
12 pages
Information Retrieval Systems (A70533)
No ratings yet
Information Retrieval Systems (A70533)
11 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
Introduction To Automatic Indexing
No ratings yet
Introduction To Automatic Indexing
28 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Sistem Temu Kembali
No ratings yet
Sistem Temu Kembali
6 pages
Boolean Searching PowerPoint
No ratings yet
Boolean Searching PowerPoint
6 pages
Sew Eurodrive PDF
No ratings yet
Sew Eurodrive PDF
116 pages
Assign 1
No ratings yet
Assign 1
2 pages
Computer Basics
No ratings yet
Computer Basics
14 pages
Amcat Coding
No ratings yet
Amcat Coding
41 pages
Automatic Indexing
100% (1)
Automatic Indexing
15 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
3 Retrieval Models
No ratings yet
3 Retrieval Models
87 pages
On Information Retrival
No ratings yet
On Information Retrival
23 pages
The Classic TF-IDF Vector Space Model
No ratings yet
The Classic TF-IDF Vector Space Model
15 pages
Page Rank Questions
No ratings yet
Page Rank Questions
4 pages
Module Handbook Adv Web Engineering-V1 0
No ratings yet
Module Handbook Adv Web Engineering-V1 0
10 pages
Chapter 13 Database Development Process - Database Design
No ratings yet
Chapter 13 Database Development Process - Database Design
7 pages
Creating A Thread by Extending The Thread Class: Package
No ratings yet
Creating A Thread by Extending The Thread Class: Package
6 pages
Questions Interview
No ratings yet
Questions Interview
7 pages
CS470 Introduction To Database Management Systems: (Chapters 13 and 14 of The Textbook)
100% (1)
CS470 Introduction To Database Management Systems: (Chapters 13 and 14 of The Textbook)
22 pages
Lecture05 Handout
No ratings yet
Lecture05 Handout
42 pages
Device Management in Operating Systems
No ratings yet
Device Management in Operating Systems
5 pages
Result Prediction by Mining Replays in Dota 2: Filip Johansson, Jesper Wikström
No ratings yet
Result Prediction by Mining Replays in Dota 2: Filip Johansson, Jesper Wikström
29 pages
Electronic Gear
No ratings yet
Electronic Gear
6 pages
OOP-18CLC1-2-W03 Contructor-Destructor
No ratings yet
OOP-18CLC1-2-W03 Contructor-Destructor
6 pages
Spotting The Hazards Means Working Out How Likely It Is That A Hazard Will Harm Someone and How Serious The Harm Could Be
No ratings yet
Spotting The Hazards Means Working Out How Likely It Is That A Hazard Will Harm Someone and How Serious The Harm Could Be
1 page
Solarmax Manual 2015
No ratings yet
Solarmax Manual 2015
24 pages
Things To Know About EPM 11.1.2.4
No ratings yet
Things To Know About EPM 11.1.2.4
58 pages
ES Teaser Example
100% (1)
ES Teaser Example
4 pages
SRIHARI V RESUME Rev
No ratings yet
SRIHARI V RESUME Rev
3 pages
CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
January 2022 - PaySlip
No ratings yet
January 2022 - PaySlip
1 page
S2-18-SS ZG537-L1
No ratings yet
S2-18-SS ZG537-L1
47 pages
Information Retrival Final Exam
0% (1)
Information Retrival Final Exam
16 pages
Mec 466 Sylabus
No ratings yet
Mec 466 Sylabus
1 page
IR - Models
100% (3)
IR - Models
58 pages
IR Endsem Leaked
No ratings yet
IR Endsem Leaked
50 pages
AMCAT Complexity Theory Questions
No ratings yet
AMCAT Complexity Theory Questions
3 pages
S2-18-SS ZG537-L1
No ratings yet
S2-18-SS ZG537-L1
60 pages
Information Retrieval
No ratings yet
Information Retrieval
2 pages
The Test Is Complete: Enlightks Ecdl/Icdl - Word Processing 5.0 - Word 2007 - Diag. Eng
No ratings yet
The Test Is Complete: Enlightks Ecdl/Icdl - Word Processing 5.0 - Word 2007 - Diag. Eng
10 pages
Combined CD Ac Papers
No ratings yet
Combined CD Ac Papers
466 pages
Atul Quiz1
No ratings yet
Atul Quiz1
5 pages
Midterm2006 Sol Csi4107
100% (2)
Midterm2006 Sol Csi4107
9 pages
Homework2 Solution
100% (1)
Homework2 Solution
11 pages
Boolean Retrieval
No ratings yet
Boolean Retrieval
34 pages
Lesson 8 Quiz
No ratings yet
Lesson 8 Quiz
3 pages
No Software Will Be Installed or Removed.: Installation Summary
No ratings yet
No Software Will Be Installed or Removed.: Installation Summary
1 page
Chapter 6 - Exercises
No ratings yet
Chapter 6 - Exercises
5 pages
Java Collections PDF
No ratings yet
Java Collections PDF
566 pages

Solution.: Increase - 3

Uploaded by

Solution.: Increase - 3

Uploaded by

Q1.

Doc 1 new home sales top forecasts

Doc 2 home sales rise in july

Doc 3 increase in home sales in july

Doc 4 july new home sales rise

Q2. Consider these documents:

a. Draw the term-document incidence matrix for this document collection.

Q4. Recommend a query processing order for

Term Postings size

Q5. Are the following statements true or false?

a. In a Boolean retrieval system, stemming never lowers precision

b. In a Boolean retrieval system, stemming never lowers recall.

c. Stemming increases the size of the vocabulary.

[4,6,10,12,14,16,18,20,22,32,47,81,120,122,157,180] and for the other it is the one entry postings

a. Using standard postings lists

Trace through the Postings lists intersection with skip pointers.

a. How often is a skip pointer followed?

angels: 2: (36,174,252,651); 4: (12,22,102,432); 7: (17);

fools: 2: (1,17,74,222); 4: (8,78,108,458); 7: (3,13,23,193);

fear: 2: (87,704,722,901); 4: (13,43,113,433); 7: (18,328,528);

in: 2: (3,37,76,444,851); 4: (10,20,110,470,500); 7: (5,15,25,195);

rush: 2: (2,66,194,321,702); 4: (9,69,149,429,569); 7: (4,14,404);

to: 2: (47,86,234,999); 4:(14,24,774,944) 7: (199,319,599,709); tread: 2: (57,94,333);

where: 2: (67,124,393,1001); 4: (11,41,101,421,431); 7: (16,36,736);

a."fools rush in"

Q11. Compute the edit distance between paris and alice.

d1 = “Big cats are nice and funny”

d2 = “Small dogs are better than big dogs”

d3 = “Small cats are afraid of small dogs”

d4 = “Big cats are not afraid of small dogs”

d5 = “Funny cats are not afraid of small dogs”

The precision is given by tp/(tp+fp) = 3/5

The recall is given by tp/(tp+fn) = 3/8

You might also like