IR - Midsem Question Paper - 2024 - Solutionfull
IR - Midsem Question Paper - 2024 - Solutionfull
___________
Pandit Deendayal Energy University
Mid Semester Examination - September 2024
B. Tech. (SOT-CE) (Elective)
Semester – VII
Date: 23/09/2024
Course Name : Information Retrieval Time: 2 hours
Course Code : 20CP417T Max. Marks: 50
Instructions:
1. Do not write anything other than your roll number on the question paper.
2. Assume suitable data wherever essential and mention it clearly.
3. Writing appropriate units, nomenclature, and drawing neat sketches/schematics wherever required is an integral part of
the answer.
NOTE: All questions are mandatory to attend, however some internal choices are given.
Mark CO
s
Q1 Explain the term Information Retrieval and illustrate its goal. How is it 4*1= [CO2]
different from Database Retrieval? 4
Difference: [2 marks]
IR DR
Deals with unstructured or semi- Operates on structured data, typically
structured data, such as documents, organized in tables with predefined
web pages, emails, or multimedia. schemas (like in relational databases).
Uses keyword-based queries or natural Uses formal query languages like SQL
language input.
Queries can be vague or ambiguous, Queries are well-defined and expect an
and the system ranks results by exact match, with structured conditions
relevance using algorithms
The system tries to return results that Returns all data that matches the query
are most relevant to the query, even if exactly, with no concept of ranking by
they are not exact matches. relevance.
Data is indexed as inverted index. Data is stored in structured formats like
tables, with rows and columns.
Q2 With respect to inverted index answer the following (any 4): 4*3= [CO1]
1. What are the possible components of a posting list? 12
The components of a posting list are: Doc ID, Term frequency, positional
information, skip pointers. [1 mark and 2 marks with full definition of
each]
2. How do positional indexes differ from standard indexes? Explain with
an example? [2 diff 2 marks, 1 marks with example]
Positional Indexes Standard Indexes
In addition to the document IDs, it Stores only the DocIDs of
also stores the exact positions (word documents and term frequency.
offsets) of the term within the
document.
Page 1 of 5
It does not record the positions or
locations of the terms within the
document.
Along with keyword queries it also Cannot directly support phrase
supports phrase and proximity queries or proximity queries but
queries. only keyword queries are
supported.
Q4 Construct an Inverted Index with Document Frequency and Positional Index 6*1= [CO4]
6
Information for the given collection of three documents:
Document 1 (DocID = D1): "Machine learning is transforming industries."
Document 2 (DocID = D2): "Artificial intelligence and machine learning are
reshaping the future."
Document 3 (DocID = D3): "The future is learning for transformation."
Page 3 of 5
Q5. (a) State the significance of adopting SOUNDEX Algorithm for spelling 2+5 = [CO4]
7
correction.
To resolve spelling errors and homophones during query searching [2 marks]
(b) From the given set of terms find all the terms that have same SOUNDEX
codes.
Sea, Plane, Flower, Hear, See, Here, Flour, Barry, Burrow, Berry, Bury,
Smith, Smyth, Smithe
Hint :
1. B, F, P, V → 1
2. C, G, J, K, Q, S, X, Z → 2
3. D, T → 3
4. L→4
5. M, N → 5
6. R→6
Ans: [Marks = number of groups identified (5,4,3,2,1)]
1) Sea, See – S000 2) Flower, Flour --- F460
3) Hear, Here ---- H600 4) Barry, Burrow, Berry, Bury --- B600
5 ) Smith, Smyth, Smithe --- S530 Plane ---- P450
Q6. With reference to Query Processing answer the following (any 3): 3*3= [CO6]
a) Given a Boolean query with three posting lists, state and analyze how 9
they will be resolved to produce a set of documents relevant to the user.
[3 mark]
Any Boolean query is resolved using Logical operators like AND, OR and
NOT. A merge operation is applied to all the posting lists start with smallest
posting list that is based on document frequency. This will optimize the merge
operation across more than two posting lists.
If skip pointers are too many then it means more number of comparisons to skip
pointers and also lots of space storing the skip pointers. [1 mark]
If skip pointers are too small then it means lesser number of comparisons to
skip pointers and also lots of little chances of skipping due to long spans. [1
mark]
Page 4 of 5
d) How does the Permuterm Index enable efficient search for prefix, suffix,
and infix wildcard queries?
A Permuterm Index is a data structure used in information retrieval systems,
particularly in search engines, to support efficient wildcard searches. Wildcard
searches allow users to search for variations/permutations of a word by using
symbols like * to replace one or more characters. The permuterm index helps
in handling such queries by storing all possible rotations of a term along with
the term itself. [2 mark]
Add a $ to the end of each term. Rotate the resulting term and index them in a
B-tree [1 mark]
Q7. Given a Query for searching, Q = “data is future”. Rank the given retrieved 9*1= [CO3]
documents using cosine similarity and using vector space model with Inverse 9
Document Frequency formulation.
Document 1: "Data science is the future."
Document 2: "Data drives intelligence."
Document 3: "Science and data are future."
***********
Page 5 of 5