Test-1 - Solution
Test-1 - Solution
HYDERABAD CAMPUS
FIRST SEMESTER 2020 – 2021
INFORMATION RETREIVAL (CS F469) – TEST-1 SOLUTION
Date: 17.09.2020 Weightage: 12% [24 Marks] Duration: 30mins. Type: Open Book
Q2. Which of the following pair of Boolean queries return the same set of documents with terms as a, b
and c? i. a OR b OR c ii. (a OR b) AND c iii. (a AND c) OR (b AND c) [ 2M]
A. i & ii
B. ii & iii
C. i & iii
D. None
Q3. In a corpus of size 3,00, 000 documents we have the following term frequencies for some of the terms:
Propose an evaluation plan for the following query: (ShivKera or RuskinBond) AND (ChetanBhagat or
VikramSeth) AND (RabindranathTagore AND KiranDesai) in order to minimize the list processing time.
We evaluate the query in the decreasing order of their sizes hence to minimize the list processing time
the following order needs to used
Q5. For what type of Boolean operators are skip pointers not useful? [Note: give your answer in the form
of Term1 BooleanOperator Term2] [2 M]
Term1 OR Term2
Term1 OR NOT Term2
NOT Term1 OR Term2
Q6. Compute the Jaccard coefficient between “top” and “stop” using bigrams. [2 M]
The bigrams for top are {to,op} aand for stop are {st,to,op} the Jaccard coefficient =2/3 = 0.667
Q7. Suppose you are designing a search engine and you would like to normalize the following tokens
Diamond, Platinum, gold, silver etc into one term. Which approach would be appropriate for
normalization and why? [2 M]
Class equivalence : The most standard way to retrieve synonyms is to create a hand constructed
equivalence classes. For example, we if we want to make the the following three terms petrol,
Gasoline, Diesel etc we create a new term let’s say this term is petroleum. When the entries in
the inverted index are made we take the union of all the documents containing petrol or Gasoline
or Diesel.
When the query term petrol entered it will be converted into petroleum and then the inverted
index will be searched for the term petroleum and all posting would be returned. This would
fetch all documents containing petrol or Gasoline or Diesel. This approach is fast at run time.
Q8. If you use an extended biword index to store the following sentence in a document “mary and john
are seeking jobs during pandemic” how many terms would be added? [2 M]
In this case we can perform POS tagging hence the following tags will be assigned to each word “N X N
X X N X N” form this list of POS tags we add consecutive nouns as terms in the inverted index hence we
will have 3 terms added in the byword index.
Q9. Suppose a user enters a trailing wildcard query Sh*sp*re (Shakespeare). What key would you use to
lookup in the permute term index? [2 M]
The following will be key we look up in the permute term index re$Sh*
Q10. Give the Soundex code for the following names Lorren and Lauren. [2 M]
When we say each dimension is orthogonal it means that the presence of one word is independent of
each other. This assumption makes the vector space model unrealistic since work appear together.