0% found this document useful (0 votes)
26 views3 pages

Test-1 - Solution

This document contains the test solutions for an Information Retrieval course. It includes 11 multiple choice and short answer questions testing concepts like Boolean queries, term frequencies, query evaluation strategies, normalization, indexes, and vector space models. The test covers topics like intersections of postings lists, skip pointers, Jaccard coefficient, soundex codes, and assumptions of the vector space model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Test-1 - Solution

This document contains the test solutions for an Information Retrieval course. It includes 11 multiple choice and short answer questions testing concepts like Boolean queries, term frequencies, query evaluation strategies, normalization, indexes, and vector space models. The test covers topics like intersections of postings lists, skip pointers, Jaccard coefficient, soundex codes, and assumptions of the vector space model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

HYDERABAD CAMPUS
FIRST SEMESTER 2020 – 2021
INFORMATION RETREIVAL (CS F469) – TEST-1 SOLUTION

Date: 17.09.2020 Weightage: 12% [24 Marks] Duration: 30mins. Type: Open Book

Q1 . Given the corpus consisting of the following five documents


d1 = "big elephants are nice and funny"
d2 = "small cubs are better than big cubs"
d3 =" small elephants are afraid of small cubs"
d4 = "big elephants are not afraid of small cubs"
d5 = "funny elephants are not afraid of small cubs"
Which documents would be retrieved for the query q = big AND cubs AND NOT funny

(D1,D2,D4) AND (D2,D3,D4,D5) AND (D2,D3,D4) = D2,D4 [2M]

Q2. Which of the following pair of Boolean queries return the same set of documents with terms as a, b
and c? i. a OR b OR c ii. (a OR b) AND c iii. (a AND c) OR (b AND c) [ 2M]

A. i & ii
B. ii & iii
C. i & iii
D. None

Q3. In a corpus of size 3,00, 000 documents we have the following term frequencies for some of the terms:

ShivKera RuskinBond ChetanBhagat VikramSeth RabindranathTagore KiranDesai

24,000 1,000 10,000 4,000 13,000 7,000

Propose an evaluation plan for the following query: (ShivKera or RuskinBond) AND (ChetanBhagat or
VikramSeth) AND (RabindranathTagore AND KiranDesai) in order to minimize the list processing time.

The following would be the sizes of subqueries [3 M]

Subquery Size of the Boolean result


(ShivKera or RuskinBond) 25,000
(ChetanBhagat or VikramSeth) 14,000
(RabindranathTagore AND KiranDesai) 7,000

We evaluate the query in the decreasing order of their sizes hence to minimize the list processing time
the following order needs to used

(RabindranathTagore AND KiranDesai) AND (ChetanBhagat or VikramSeth) AND (ShivKera or RuskinBond)


Q4. Given the following postings list for two terms of an AND query [2 M]
T1: (4, 6, 10, 12, 14, 16, 18, 20, 22, 32, 47, 81, 120, 122, 157, 180)
T2: (47)
how many comparisons would be done to intersect the two postings list with the following two strategies.
i. Using standard postings list.
ii. Using Skip List

i. 11 Comparisons are required if standard postings list is used.


ii. 5 Comparisons are required if Skip list is used. This value depends on how the skip lists are
places.

Q5. For what type of Boolean operators are skip pointers not useful? [Note: give your answer in the form
of Term1 BooleanOperator Term2] [2 M]
Term1 OR Term2
Term1 OR NOT Term2
NOT Term1 OR Term2

Q6. Compute the Jaccard coefficient between “top” and “stop” using bigrams. [2 M]

The bigrams for top are {to,op} aand for stop are {st,to,op} the Jaccard coefficient =2/3 = 0.667

Q7. Suppose you are designing a search engine and you would like to normalize the following tokens
Diamond, Platinum, gold, silver etc into one term. Which approach would be appropriate for
normalization and why? [2 M]

Class equivalence : The most standard way to retrieve synonyms is to create a hand constructed
equivalence classes. For example, we if we want to make the the following three terms petrol,
Gasoline, Diesel etc we create a new term let’s say this term is petroleum. When the entries in
the inverted index are made we take the union of all the documents containing petrol or Gasoline
or Diesel.
When the query term petrol entered it will be converted into petroleum and then the inverted
index will be searched for the term petroleum and all posting would be returned. This would
fetch all documents containing petrol or Gasoline or Diesel. This approach is fast at run time.

Q8. If you use an extended biword index to store the following sentence in a document “mary and john
are seeking jobs during pandemic” how many terms would be added? [2 M]

In this case we can perform POS tagging hence the following tags will be assigned to each word “N X N
X X N X N” form this list of POS tags we add consecutive nouns as terms in the inverted index hence we
will have 3 terms added in the byword index.

Q9. Suppose a user enters a trailing wildcard query Sh*sp*re (Shakespeare). What key would you use to
lookup in the permute term index? [2 M]

The following will be key we look up in the permute term index re$Sh*

Q10. Give the Soundex code for the following names Lorren and Lauren. [2 M]

Both map to the same soundex code L605


Q11. Positional indexes are a more efficient alternative to ____________________ indexes. [1 M]
Permute term Index
Biword index
Inverted index
Hsh Table
Q12. In the vector space model the assumption is that each dimension is orthogonal to each other. What
does it mean and why this assumption is not realistic? [2 M]

When we say each dimension is orthogonal it means that the presence of one word is independent of
each other. This assumption makes the vector space model unrealistic since work appear together.

You might also like