0% found this document useful (0 votes)

26 views3 pages

Test-1 - Solution

This document contains the test solutions for an Information Retrieval course. It includes 11 multiple choice and short answer questions testing concepts like Boolean queries, term frequencies, query evaluation strategies, normalization, indexes, and vector space models. The test covers topics like intersections of postings lists, skip pointers, Jaccard coefficient, soundex codes, and assumptions of the vector space model.

Uploaded by

MADHUR SARAF JAIN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views3 pages

Test-1 - Solution

Uploaded by

MADHUR SARAF JAIN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

HYDERABAD CAMPUS
FIRST SEMESTER 2020 – 2021
INFORMATION RETREIVAL (CS F469) – TEST-1 SOLUTION

Date: 17.09.2020 Weightage: 12% [24 Marks] Duration: 30mins. Type: Open Book

Q1 . Given the corpus consisting of the following five documents

d1 = "big elephants are nice and funny"
d2 = "small cubs are better than big cubs"
d3 =" small elephants are afraid of small cubs"
d4 = "big elephants are not afraid of small cubs"
d5 = "funny elephants are not afraid of small cubs"
Which documents would be retrieved for the query q = big AND cubs AND NOT funny

(D1,D2,D4) AND (D2,D3,D4,D5) AND (D2,D3,D4) = D2,D4 [2M]

Q2. Which of the following pair of Boolean queries return the same set of documents with terms as a, b
and c? i. a OR b OR c ii. (a OR b) AND c iii. (a AND c) OR (b AND c) [ 2M]

A. i & ii
B. ii & iii
C. i & iii
D. None

Q3. In a corpus of size 3,00, 000 documents we have the following term frequencies for some of the terms:

ShivKera RuskinBond ChetanBhagat VikramSeth RabindranathTagore KiranDesai

24,000 1,000 10,000 4,000 13,000 7,000

Propose an evaluation plan for the following query: (ShivKera or RuskinBond) AND (ChetanBhagat or
VikramSeth) AND (RabindranathTagore AND KiranDesai) in order to minimize the list processing time.

The following would be the sizes of subqueries [3 M]

Subquery Size of the Boolean result

(ShivKera or RuskinBond) 25,000
(ChetanBhagat or VikramSeth) 14,000
(RabindranathTagore AND KiranDesai) 7,000

We evaluate the query in the decreasing order of their sizes hence to minimize the list processing time
the following order needs to used

(RabindranathTagore AND KiranDesai) AND (ChetanBhagat or VikramSeth) AND (ShivKera or RuskinBond)

Q4. Given the following postings list for two terms of an AND query [2 M]
T1: (4, 6, 10, 12, 14, 16, 18, 20, 22, 32, 47, 81, 120, 122, 157, 180)
T2: (47)
how many comparisons would be done to intersect the two postings list with the following two strategies.
i. Using standard postings list.
ii. Using Skip List

i. 11 Comparisons are required if standard postings list is used.

ii. 5 Comparisons are required if Skip list is used. This value depends on how the skip lists are
places.

Q5. For what type of Boolean operators are skip pointers not useful? [Note: give your answer in the form
of Term1 BooleanOperator Term2] [2 M]
Term1 OR Term2
Term1 OR NOT Term2
NOT Term1 OR Term2

Q6. Compute the Jaccard coefficient between “top” and “stop” using bigrams. [2 M]

The bigrams for top are {to,op} aand for stop are {st,to,op} the Jaccard coefficient =2/3 = 0.667

Q7. Suppose you are designing a search engine and you would like to normalize the following tokens
Diamond, Platinum, gold, silver etc into one term. Which approach would be appropriate for
normalization and why? [2 M]

Class equivalence : The most standard way to retrieve synonyms is to create a hand constructed
equivalence classes. For example, we if we want to make the the following three terms petrol,
Gasoline, Diesel etc we create a new term let’s say this term is petroleum. When the entries in
the inverted index are made we take the union of all the documents containing petrol or Gasoline
or Diesel.
When the query term petrol entered it will be converted into petroleum and then the inverted
index will be searched for the term petroleum and all posting would be returned. This would
fetch all documents containing petrol or Gasoline or Diesel. This approach is fast at run time.

Q8. If you use an extended biword index to store the following sentence in a document “mary and john
are seeking jobs during pandemic” how many terms would be added? [2 M]

In this case we can perform POS tagging hence the following tags will be assigned to each word “N X N
X X N X N” form this list of POS tags we add consecutive nouns as terms in the inverted index hence we
will have 3 terms added in the byword index.

Q9. Suppose a user enters a trailing wildcard query Sh*sp*re (Shakespeare). What key would you use to
lookup in the permute term index? [2 M]

The following will be key we look up in the permute term index re$Sh*

Q10. Give the Soundex code for the following names Lorren and Lauren. [2 M]

Both map to the same soundex code L605

Q11. Positional indexes are a more efficient alternative to ____________________ indexes. [1 M]
Permute term Index
Biword index
Inverted index
Hsh Table
Q12. In the vector space model the assumption is that each dimension is orthogonal to each other. What
does it mean and why this assumption is not realistic? [2 M]

When we say each dimension is orthogonal it means that the presence of one word is independent of
each other. This assumption makes the vector space model unrealistic since work appear together.

Sample Justification For Travel For Teachers
100% (5)
Sample Justification For Travel For Teachers
2 pages
SOLUTION, SUSPENSION and COLLOID Activity Sheet
67% (3)
SOLUTION, SUSPENSION and COLLOID Activity Sheet
1 page
Gr11 P2 ECO June 2024 Question Paper - 125612
100% (1)
Gr11 P2 ECO June 2024 Question Paper - 125612
13 pages
Manual de Servicio de Analizador de Química Clínica
0% (1)
Manual de Servicio de Analizador de Química Clínica
516 pages
The Manual For The Quality Management of Educational Programmes in Myanmar
100% (1)
The Manual For The Quality Management of Educational Programmes in Myanmar
160 pages
Venkat - AEM Developer
No ratings yet
Venkat - AEM Developer
4 pages
SSP Cakram5 6
No ratings yet
SSP Cakram5 6
420 pages
Paidout Policies
No ratings yet
Paidout Policies
2 pages
Problems and Prospects of General Insurance in Bangladesh
75% (4)
Problems and Prospects of General Insurance in Bangladesh
56 pages
Mint Delhi 10.08.2020 PDF
No ratings yet
Mint Delhi 10.08.2020 PDF
17 pages
Box Pushing Paper - 1
No ratings yet
Box Pushing Paper - 1
5 pages
What Is Defensive Driving?
No ratings yet
What Is Defensive Driving?
3 pages
Ft-757gx2 User Hb9fax
No ratings yet
Ft-757gx2 User Hb9fax
37 pages
Food Safety, Sanitation and Hygience
No ratings yet
Food Safety, Sanitation and Hygience
21 pages
A Review On Lifting Beams: July 2017
No ratings yet
A Review On Lifting Beams: July 2017
14 pages
University of Okara: Advertisement No. 2/2020
No ratings yet
University of Okara: Advertisement No. 2/2020
3 pages
Edraky - SD
No ratings yet
Edraky - SD
29 pages
Module 9: Social and Resources Mobilization: An Approach in The Implementation of Civic Welfare and Training Services
No ratings yet
Module 9: Social and Resources Mobilization: An Approach in The Implementation of Civic Welfare and Training Services
25 pages
Number System Representation - Study Notes
No ratings yet
Number System Representation - Study Notes
12 pages
Computer (Eng) SSC CHSL 2024 All 70 Questions (RBE)
No ratings yet
Computer (Eng) SSC CHSL 2024 All 70 Questions (RBE)
8 pages
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
No ratings yet
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
1 page
CFor Speed Setup
No ratings yet
CFor Speed Setup
13 pages
Critical Analysis of My Mother at Sixty Six
No ratings yet
Critical Analysis of My Mother at Sixty Six
7 pages
3 Recessed
No ratings yet
3 Recessed
11 pages
Pleuropulmonary Infections
No ratings yet
Pleuropulmonary Infections
40 pages
Four Dimension of Cloud Cube Model
No ratings yet
Four Dimension of Cloud Cube Model
2 pages
En (1070)
100% (1)
En (1070)
1 page
Centrifugation I.: Centrifuge Selection - Tubular Bowl Centrifuge
No ratings yet
Centrifugation I.: Centrifuge Selection - Tubular Bowl Centrifuge
3 pages
TOMEI Camshaft Spec Card TOMEI Camshaft Spec Card TOMEI Camshaft Spec Card
No ratings yet
TOMEI Camshaft Spec Card TOMEI Camshaft Spec Card TOMEI Camshaft Spec Card
1 page
COVID-19 Testing
No ratings yet
COVID-19 Testing
2 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)

Test-1 - Solution

Uploaded by

Test-1 - Solution

Uploaded by

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

Q1 . Given the corpus consisting of the following five documents

(D1,D2,D4) AND (D2,D3,D4,D5) AND (D2,D3,D4) = D2,D4 [2M]

ShivKera RuskinBond ChetanBhagat VikramSeth RabindranathTagore KiranDesai

24,000 1,000 10,000 4,000 13,000 7,000

The following would be the sizes of subqueries [3 M]

Subquery Size of the Boolean result

(RabindranathTagore AND KiranDesai) AND (ChetanBhagat or VikramSeth) AND (ShivKera or RuskinBond)

i. 11 Comparisons are required if standard postings list is used.

Both map to the same soundex code L605

You might also like