0% found this document useful (0 votes)

11 views7 pages

IR - Midsem Question Paper - 2024 - Solutionfull

This document is an examination paper for the B. Tech. (SOT-CE) course on Information Retrieval at Pandit Deendayal Energy University, dated September 23, 2024. It includes various questions related to information retrieval concepts, inverted indexes, edit distance, SOUNDEX algorithm, query processing, and ranking documents using cosine similarity. The exam consists of mandatory questions with internal choices and has a maximum score of 50 marks.

Uploaded by

yalok96639

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

IR - Midsem Question Paper - 2024 - Solutionfull

Uploaded by

yalok96639

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Roll No.

___________
Pandit Deendayal Energy University
Mid Semester Examination - September 2024
B. Tech. (SOT-CE) (Elective)
Semester – VII
Date: 23/09/2024
Course Name : Information Retrieval Time: 2 hours
Course Code : 20CP417T Max. Marks: 50
Instructions:
1. Do not write anything other than your roll number on the question paper.
2. Assume suitable data wherever essential and mention it clearly.
3. Writing appropriate units, nomenclature, and drawing neat sketches/schematics wherever required is an integral part of
the answer.
NOTE: All questions are mandatory to attend, however some internal choices are given.
Mark CO
s
Q1 Explain the term Information Retrieval and illustrate its goal. How is it 4*1= [CO2]
different from Database Retrieval? 4

Ans: Information Retrieval (IR) is finding material (usually documents) of an

unstructured nature (usually text) that satisfies an information need from within large
collections (usually stored on computers). [1 mark]

Goals: To retrieve documents with information that is relevant to the user’s

information need and helps the user complete a task. A good retrieval model will find
documents that are likely to be considered relevant by the person who submitted the
query. [1 mark]

Difference: [2 marks]
IR DR
Deals with unstructured or semi- Operates on structured data, typically
structured data, such as documents, organized in tables with predefined
web pages, emails, or multimedia. schemas (like in relational databases).
Uses keyword-based queries or natural Uses formal query languages like SQL
language input.
Queries can be vague or ambiguous, Queries are well-defined and expect an
and the system ranks results by exact match, with structured conditions
relevance using algorithms
The system tries to return results that Returns all data that matches the query
are most relevant to the query, even if exactly, with no concept of ranking by
they are not exact matches. relevance.
Data is indexed as inverted index. Data is stored in structured formats like
tables, with rows and columns.
Q2 With respect to inverted index answer the following (any 4): 4*3= [CO1]
1. What are the possible components of a posting list? 12
The components of a posting list are: Doc ID, Term frequency, positional
information, skip pointers. [1 mark and 2 marks with full definition of
each]
2. How do positional indexes differ from standard indexes? Explain with
an example? [2 diff 2 marks, 1 marks with example]
Positional Indexes Standard Indexes
In addition to the document IDs, it Stores only the DocIDs of
also stores the exact positions (word documents and term frequency.
offsets) of the term within the
document.

Page 1 of 5
It does not record the positions or
locations of the terms within the
document.
Along with keyword queries it also Cannot directly support phrase
supports phrase and proximity queries or proximity queries but
queries. only keyword queries are
supported.

3. What are the main difficulties in determining vocabulary terms

in languages characterized by complicated word structures? Elaborate
them. --- [3 marks]
--- [2 marks without examples]
 Phrases like co-education, State-of-the-art
 Numeric data like Dates and phone numbers eg. 20/3/91 , 3/20/91, Mar
20, 1991, B-52 100.2.86.144, (800) 234-2333, 800.234.2333
 No whitespaces eg Chinese
 Use of apostrophe
 Language specific text
 Accents and diacrits
 Multilingual Text
 Language reading direction eg Arabic

4. What are the potential challenges of Boolean retrieval when dealing

with large document collections or ambiguous queries? --- [3 marks]
 Have no provision for document ranking
 Applies only logical operators and hence extracts out the exact
results without any context or reference.
 Cannot handle phrase queries
 For large document collection storage will be the issue as
incidence matrix constructed will be composing of sparsity.
 Queries using "OR" operators can return an overwhelming
number of results, especially in large collections. This is because
any document that matches any of the terms will be included,
leading to low precision.
 Queries using "AND" or "NOT" operators may exclude
important documents if they lack one or more of the specified
terms, even if they are relevant to the user’s information need.
This results in lower recall.

5. How is a weighted term-document matrix beneficial than binary term-

document incidence matrix in Information Retrieval? --- [3 marks]

In a binary term-document matrix, entries are either 0 or 1, indicating only the

presence or absence of a term in a document. This approach doesn't consider
how important or frequent a term is within a document or across the collection.
The weighted matrix assigns numerical weights to terms based on their
significance. This allows for distinguishing between terms that are more
informative or relevant for a document and common terms that carry less
significance. By applying weights like TF-IDF (Term Frequency-Inverse
Document Frequency), the matrix downweights the common terms and boosts
the importance of rare but informative terms, improving the ranking of
documents whereas, Since the binary matrix does not account for term
frequency, all terms are treated equally in relevance calculations.
Q3 Find the Edit Distance between the term’s "intention" to "execution”. 3*1= [CO6]
3
Page 2 of 5
Ans: 5 [1 Mark] and matrix [2 marks ]

Q4 Construct an Inverted Index with Document Frequency and Positional Index 6*1= [CO4]
6
Information for the given collection of three documents:
Document 1 (DocID = D1): "Machine learning is transforming industries."
Document 2 (DocID = D2): "Artificial intelligence and machine learning are
reshaping the future."
Document 3 (DocID = D3): "The future is learning for transformation."

Note: Apply stop word removal and lemmatization wherever necessary to

extract terms, as per your understanding.

Page 3 of 5
Q5. (a) State the significance of adopting SOUNDEX Algorithm for spelling 2+5 = [CO4]
7
correction.
To resolve spelling errors and homophones during query searching [2 marks]
(b) From the given set of terms find all the terms that have same SOUNDEX
codes.
Sea, Plane, Flower, Hear, See, Here, Flour, Barry, Burrow, Berry, Bury,
Smith, Smyth, Smithe
Hint :
1. B, F, P, V → 1
2. C, G, J, K, Q, S, X, Z → 2
3. D, T → 3
4. L→4
5. M, N → 5
6. R→6
Ans: [Marks = number of groups identified (5,4,3,2,1)]
1) Sea, See – S000 2) Flower, Flour --- F460
3) Hear, Here ---- H600 4) Barry, Burrow, Berry, Bury --- B600
5 ) Smith, Smyth, Smithe --- S530 Plane ---- P450
Q6. With reference to Query Processing answer the following (any 3): 3*3= [CO6]
a) Given a Boolean query with three posting lists, state and analyze how 9
they will be resolved to produce a set of documents relevant to the user.
[3 mark]
Any Boolean query is resolved using Logical operators like AND, OR and
NOT. A merge operation is applied to all the posting lists start with smallest
posting list that is based on document frequency. This will optimize the merge
operation across more than two posting lists.

b) Distinguish between Stemming and lemmatization [2 mark] with an

example [1 mark].
Differences
Output Form: Stemming may produce non-words or root forms, whereas
lemmatization produces actual dictionary words.
Complexity: Lemmatization is more complex and computationally intensive
because it involves understanding the context and the grammatical form of the
word.
Accuracy: Lemmatization is generally more accurate than stemming, especially
in handling irregular forms.
Example Original Sentence: "He was enjoying the beautiful singing."
Stemmed Sentence: "He was enjoy the beauti sing."
Lemmatized Sentence: "He was enjoy the beautiful sing."

c) How do skip pointers improve the efficiency of intersecting large

postings lists in information retrieval? What are the potential drawbacks
of using too many or too few skip pointers in a postings list?
Skip pointers are effectively shortcuts that allow us to avoid processing parts of
the postings list that will not figure in the search results. [1 mark]

If skip pointers are too many then it means more number of comparisons to skip
pointers and also lots of space storing the skip pointers. [1 mark]
If skip pointers are too small then it means lesser number of comparisons to
skip pointers and also lots of little chances of skipping due to long spans. [1
mark]
Page 4 of 5
d) How does the Permuterm Index enable efficient search for prefix, suffix,
and infix wildcard queries?
A Permuterm Index is a data structure used in information retrieval systems,
particularly in search engines, to support efficient wildcard searches. Wildcard
searches allow users to search for variations/permutations of a word by using
symbols like * to replace one or more characters. The permuterm index helps
in handling such queries by storing all possible rotations of a term along with
the term itself. [2 mark]
Add a $ to the end of each term. Rotate the resulting term and index them in a
B-tree [1 mark]

Q7. Given a Query for searching, Q = “data is future”. Rank the given retrieved 9*1= [CO3]
documents using cosine similarity and using vector space model with Inverse 9
Document Frequency formulation.
Document 1: "Data science is the future."
Document 2: "Data drives intelligence."
Document 3: "Science and data are future."

Note: Apply stop word removal and lemmatization wherever necessary to

extract terms, as per your understanding.

***********

Page 5 of 5

IR MCQ With Answers
100% (1)
IR MCQ With Answers
23 pages
CS3308 Information Retrieval Quiz
50% (2)
CS3308 Information Retrieval Quiz
63 pages
Introduction To Information Rertrieval Answer
100% (4)
Introduction To Information Rertrieval Answer
6 pages
1 Absolutism Vs Relavatism
No ratings yet
1 Absolutism Vs Relavatism
4 pages
Ecology PDF
No ratings yet
Ecology PDF
3 pages
Mid Semster Exam QP
100% (2)
Mid Semster Exam QP
2 pages
Paver Block Specification
No ratings yet
Paver Block Specification
8 pages
Practice Question For Information Retrieval Subject
No ratings yet
Practice Question For Information Retrieval Subject
5 pages
Oral Habits and Its Relationship To Malocclusion A Review.20141212083000
No ratings yet
Oral Habits and Its Relationship To Malocclusion A Review.20141212083000
4 pages
IRS Most Important Topic
No ratings yet
IRS Most Important Topic
4 pages
M-35 Mix Design
No ratings yet
M-35 Mix Design
1 page
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Team 10 Primer
No ratings yet
Team 10 Primer
12 pages
Introduction IR
No ratings yet
Introduction IR
61 pages
XRIO User Manual
No ratings yet
XRIO User Manual
38 pages
2001 Nieuwaal
No ratings yet
2001 Nieuwaal
89 pages
FreemanWhite Hybrid Operating Room Design Guide PDF
No ratings yet
FreemanWhite Hybrid Operating Room Design Guide PDF
11 pages
QP Midsem Regular - Solutions For IR
100% (2)
QP Midsem Regular - Solutions For IR
4 pages
RW A. Com: An Essay On Criticism
No ratings yet
RW A. Com: An Essay On Criticism
1 page
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
No ratings yet
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
61 pages
A2mot En5
100% (1)
A2mot En5
5 pages
TOS - Statistics and Probability - 3rd Quarter Examination
No ratings yet
TOS - Statistics and Probability - 3rd Quarter Examination
2 pages
Ir 1
No ratings yet
Ir 1
14 pages
Question Text: Correct Mark 1.00 Out of 1.00
No ratings yet
Question Text: Correct Mark 1.00 Out of 1.00
49 pages
01 Intro
No ratings yet
01 Intro
145 pages
PHD Thesis GauthamRam Cover Final
No ratings yet
PHD Thesis GauthamRam Cover Final
251 pages
Ground Floor Containment Overall Layout
No ratings yet
Ground Floor Containment Overall Layout
1 page
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
61 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
2-Boolean IR and Indexing
No ratings yet
2-Boolean IR and Indexing
46 pages
CCM 303 Topic 8 PPT Gender and Communication in The Media PDF
No ratings yet
CCM 303 Topic 8 PPT Gender and Communication in The Media PDF
23 pages
L3L4 IRSW Boolean Retrieval
No ratings yet
L3L4 IRSW Boolean Retrieval
54 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
69 pages
Introduction To Information Retrieval: Courtesy
No ratings yet
Introduction To Information Retrieval: Courtesy
61 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
Plucker and Callahan 2014
No ratings yet
Plucker and Callahan 2014
17 pages
Unit 2
No ratings yet
Unit 2
58 pages
IR Unit 2 Final
No ratings yet
IR Unit 2 Final
43 pages
Information Retrieval (CS6370) : Maunendra Sankar Desarkar
No ratings yet
Information Retrieval (CS6370) : Maunendra Sankar Desarkar
44 pages
Combined Ir Exam
No ratings yet
Combined Ir Exam
50 pages
IR-Lec1 - Ch1-2023
No ratings yet
IR-Lec1 - Ch1-2023
41 pages
Information Retrieval - 1
No ratings yet
Information Retrieval - 1
47 pages
Information Retrieval Q - A
No ratings yet
Information Retrieval Q - A
22 pages
Unit 2 Irt
No ratings yet
Unit 2 Irt
33 pages
B Tech WSM CSE 442 Endterm Online NOV 20-11-2021
No ratings yet
B Tech WSM CSE 442 Endterm Online NOV 20-11-2021
3 pages
IR Unit 2
No ratings yet
IR Unit 2
54 pages
600 Computer Mcqs
No ratings yet
600 Computer Mcqs
23 pages
IRSunit 2
No ratings yet
IRSunit 2
20 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
15 pages
Academy of Management
No ratings yet
Academy of Management
20 pages
IR Lec04 Skip Ptrs Phrase Queries Indexing
No ratings yet
IR Lec04 Skip Ptrs Phrase Queries Indexing
18 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
Supervisionguide16 17 Students
No ratings yet
Supervisionguide16 17 Students
17 pages
T1 PDF
No ratings yet
T1 PDF
2 pages
University of Mumbai MCQ Question Bank: Semester
No ratings yet
University of Mumbai MCQ Question Bank: Semester
17 pages
Supervisionguide15 16 Students
No ratings yet
Supervisionguide15 16 Students
18 pages
asila-IR
No ratings yet
asila-IR
16 pages
Ir MCQ-1
No ratings yet
Ir MCQ-1
22 pages
Module 1-1
No ratings yet
Module 1-1
12 pages
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
No ratings yet
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
12 pages
Tamrakar 2015
No ratings yet
Tamrakar 2015
6 pages
Creativity Is Always A Social Process
No ratings yet
Creativity Is Always A Social Process
17 pages
Midterm Exam Information Retrieval (INLS 509) March 6 TH, 2013
No ratings yet
Midterm Exam Information Retrieval (INLS 509) March 6 TH, 2013
9 pages
IIRS Quiz-1 Bits
No ratings yet
IIRS Quiz-1 Bits
15 pages
Ir End Pyq Sols
No ratings yet
Ir End Pyq Sols
8 pages
Understanding Color and Color Schemes
No ratings yet
Understanding Color and Color Schemes
20 pages
CS8080 Irt Unit Ii Qbank Main
No ratings yet
CS8080 Irt Unit Ii Qbank Main
8 pages
Ir QB
No ratings yet
Ir QB
8 pages
Irt Ans
No ratings yet
Irt Ans
9 pages
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
No ratings yet
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
7 pages
Fuzzy Logic To Controlled Signal System
No ratings yet
Fuzzy Logic To Controlled Signal System
10 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
Lecture 3-Skip Pointers and Phrase Queries
No ratings yet
Lecture 3-Skip Pointers and Phrase Queries
12 pages
Gr.8 - Unit #3 - L.4 - Speech Analysis
No ratings yet
Gr.8 - Unit #3 - L.4 - Speech Analysis
11 pages
Sheet 2 ch2
No ratings yet
Sheet 2 ch2
4 pages
IR Model Question Paper
No ratings yet
IR Model Question Paper
2 pages
Solution.: Increase - 3
No ratings yet
Solution.: Increase - 3
5 pages
Theory Assignment
No ratings yet
Theory Assignment
4 pages
Ece 34 - Microprocessor System Project
No ratings yet
Ece 34 - Microprocessor System Project
3 pages
Data Class Nist SP 1800 39a Preliminary Draft
No ratings yet
Data Class Nist SP 1800 39a Preliminary Draft
4 pages
M62015L, FP M62016L, FP: V C Reset INT GND
No ratings yet
M62015L, FP M62016L, FP: V C Reset INT GND
4 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
ACFrOgAhDKMNiLdAKJ27Hzg52gNTQw 5K PHitykqmtwIgd9UKTVkmihywbzrIyBvrHsHZZ9wixYTTAUoZYnERTr6vUQ Cfqlt65bXEVoMBh Ta3S1geQE-C8DUlimE
No ratings yet
ACFrOgAhDKMNiLdAKJ27Hzg52gNTQw 5K PHitykqmtwIgd9UKTVkmihywbzrIyBvrHsHZZ9wixYTTAUoZYnERTr6vUQ Cfqlt65bXEVoMBh Ta3S1geQE-C8DUlimE
2 pages
Squib 1
No ratings yet
Squib 1
2 pages
COA-RO9 APP-CSE 2024 Other Items
No ratings yet
COA-RO9 APP-CSE 2024 Other Items
3 pages
PROJECT
No ratings yet
PROJECT
6 pages
1029713-SILVER Character Sheet.v1.16
No ratings yet
1029713-SILVER Character Sheet.v1.16
4 pages
Laporan Daftar Pengguna GoodEva SmartSafety - Batch 1
No ratings yet
Laporan Daftar Pengguna GoodEva SmartSafety - Batch 1
3 pages
Schematron: A language for validating XML
From Everand
Schematron: A language for validating XML
Erik Siegel
No ratings yet
IGNOU PGDCA MCS 207 Database Management Systems Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 207 Database Management Systems Previous Years Unsolved Papers
Manish Soni
No ratings yet

IR - Midsem Question Paper - 2024 - Solutionfull

Uploaded by

IR - Midsem Question Paper - 2024 - Solutionfull

Uploaded by

Roll No.

Ans: Information Retrieval (IR) is finding material (usually documents) of an

Goals: To retrieve documents with information that is relevant to the user’s

3. What are the main difficulties in determining vocabulary terms

4. What are the potential challenges of Boolean retrieval when dealing

5. How is a weighted term-document matrix beneficial than binary term-

In a binary term-document matrix, entries are either 0 or 1, indicating only the

Note: Apply stop word removal and lemmatization wherever necessary to

b) Distinguish between Stemming and lemmatization [2 mark] with an

c) How do skip pointers improve the efficiency of intersecting large

Note: Apply stop word removal and lemmatization wherever necessary to

You might also like