0% found this document useful (0 votes)
9 views15 pages

All Unit 2 Mark

The document provides concise 2-mark answers for various topics related to Information Retrieval Techniques across multiple units. Key concepts covered include comparisons between Information Retrieval and Web Search, types of search engines, evaluation metrics for classifiers, and techniques for handling web content. It also discusses machine learning algorithms, clustering methods, and the importance of link analysis in web search.

Uploaded by

rahul2004.x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views15 pages

All Unit 2 Mark

The document provides concise 2-mark answers for various topics related to Information Retrieval Techniques across multiple units. Key concepts covered include comparisons between Information Retrieval and Web Search, types of search engines, evaluation metrics for classifiers, and techniques for handling web content. It also discusses machine learning algorithms, clustering methods, and the importance of link analysis in web search.

Uploaded by

rahul2004.x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

All the best for your IRT exam tomorrow!

Here are clear and concise 2-mark answers from Unit 1, just the way you
asked:

1. Compare Informa on Retrieval and Web Search. (Apr 17)

Informa on Retrieval (IR):

 Focuses on retrieving relevant documents from a large collec on based on user queries.

 Works on structured or unstructured data (e.g., books, research papers).

Web Search:

 A type of IR that deals specifically with the World Wide Web.

 Involves crawling, indexing web pages, and ranking using algorithms (like Google).

2. Iden fy the need of informa on retrieval. (Apr 22)

 Helps users find relevant informa on from large collec ons.

 Organizes unstructured data (like text) to provide meaningful search results.

 Supports decision making, research, and data analysis.

3. Enumerate the purpose of Informa on Retrieval. (Apr 21)

 To provide relevant documents to users based on their queries.

 To filter and rank informa on from large datasets.

 To improve access to knowledge and support efficient searching.

4. Iden fy the types of search engines. (Apr 21)

 Crawler-based search engines (e.g., Google, Bing)

 Human-powered directories (e.g., DMOZ – now closed)

 Hybrid search engines (combine both, e.g., Yahoo)

 Metasearch engines (e.g., Dogpile – fetch results from mul ple engines)

5. What are the major components of tradi onal IRT? (Apr 24)

 Document collec on – Set of documents to be searched.

 Indexing – Organizing content for fast retrieval.

 Query processing – Interprets user input.


 Retrieval model – Ranks documents based on relevance.

 Evalua on module – Measures performance of the system.

6. What are the components of search engine? (Apr/May 2023)

 Crawler – Collects web pages.

 Indexer – Stores and organizes data.

 Query Processor – Matches user queries with indexed data.

 Ranking Module – Orders results based on relevance.

 User Interface – Displays search results to the user.

7. List the issues in informa on retrieval system. (Apr/May 2023)

 Ambiguity in queries

 Synonyms and polysemy

 Ranking and relevance mismatch

 Scalability for large datasets

 Handling of unstructured data

 Evalua on of IR system performance

8. Discuss the use of hypertext in IR.

 Hypertext contains links (hyperlinks) connec ng related documents.

 Enhances naviga on and retrieval by providing mul ple paths to related content.

 Common in web-based IR systems like search engines.

 Improves user experience by allowing dynamic access to informa on.

9. What is “The e-Publishing era”? (Apr 24)

The e-Publishing era refers to the me when content is created, distributed, and accessed digitally rather than in print.

 It enables fast, wide, and cost-effec ve distribu on of books, journals, and documents.

 Examples: eBooks, online journals, blogs, and digital libraries.

10. Define central repository.


A central repository is a central storage loca on where all data, documents, or informa on are stored, managed, and
maintained.

 It ensures easy access, backup, version control, and data sharing.

 Example: GitHub for code, DSpace for digital libraries.

11. What is inves ga on?

Inves ga on in IR refers to the process of exploring, searching, and analyzing data or documents to find useful or
relevant informa on.

 It helps users dig deeper into the data for insights or answers to queries.

12. What is sensemaking?

Sensemaking is the process of interpre ng and understanding informa on to make informed decisions.

 In IR, it helps users understand search results and connect informa on meaningfully.

 It involves organizing, filtering, and giving meaning to retrieved data.

Let me know if you want these compiled in a PDF or want answers for Unit 2 next!
Here are the Unit 2 – Informa on Retrieval Technique 2-mark answers in a clear and simple way for your exam prep:

1. Define Unigram language model and Bigram language model.

 Unigram Model: Assumes each word in a document is independent. Probability of a sentence is the product of
probabili es of individual words.

 Bigram Model: Considers the probability of a word based on the previous word. It captures some word
dependencies.

2. Define Latent Seman c Indexing (LSI).

LSI is a technique that reduces the dimensionality of the term-document matrix using Singular Value Decomposi on
(SVD) to capture hidden rela onships between terms and documents.
It helps overcome synonymy and polysemy in retrieval.

3. What is zone index?

A zone index divides a document into mul ple zones (like tle, abstract, body) and assigns weights to terms based on the
zone.
Terms in important zones like the tle may be given higher importance.

4. Define relevance feedback. (Apr 21, Apr 23)

Relevance feedback is a technique where the system improves search results based on user feedback.
If a user marks results as relevant or irrelevant, the system refines future queries for be er accuracy.

5. Compare Term Frequency and Inverse Document Frequency. (Apr 21)

 Term Frequency (TF): Measures how o en a term appears in a document.

 Inverse Document Frequency (IDF): Measures how rare a term is across all documents.
→ TF shows local importance, IDF shows global uniqueness.

6. Explain Query Expansion.

Query Expansion is the process of enhancing a user query by adding related or synonymous terms.
It helps retrieve more relevant results by reducing vocabulary mismatch between the query and documents.

7. Draw the neural network model. (Apr 24)

A basic neural network includes:

 Input layer – Takes the query/document features


 Hidden layer(s) – Applies transforma ons using weights and ac va ons

 Output layer – Predicts relevance score

(You can draw a 3-layer diagram with arrows showing input → hidden → output)

8. Define Term Weigh ng and list its two factors. (Apr 23, Apr 24)

Term Weigh ng assigns a numerical score to terms to reflect their importance.


Two factors:

1. Term Frequency (TF)

2. Inverse Document Frequency (IDF)

9. Define TF-IDF.

TF-IDF (Term Frequency – Inverse Document Frequency) is a term weigh ng scheme that gives high scores to terms that
occur frequently in a document but rarely in the en re collec on.

10. Explain TF-IDF scheme.

 TF indicates how o en a term appears in a document.

 IDF reduces the weight of common terms that appear in many documents.

 TF-IDF = TF × IDF
→ Helps iden fy important and unique terms in documents.

11. Define Precision and Recall.

 Precision = (Relevant documents retrieved) / (Total documents retrieved)

 Recall = (Relevant documents retrieved) / (Total relevant documents available)


→ Precision focuses on accuracy, Recall on completeness.

12. What is inversion in indexing process?

Inversion is the process of crea ng an inverted index, where for each term, a list of documents in which it appears is
stored.
It allows fast retrieval of documents matching a query term.

13. Define an Informa on Retrieval Model.

An Informa on Retrieval Model is a framework that defines how documents are represented, matched, and ranked
based on a query.
Examples: Boolean Model, Vector Space Model, Probabilis c Model.
Here are the 2-mark answers for the ques ons you asked from Unit 2 – IRT:

14. Define Document Preprocessing

Document preprocessing is the process of cleaning and preparing text data before indexing or retrieval.
It includes steps like:

 Tokeniza on (spli ng text into words)

 Stop word removal (removing common words like "the", "and")

 Stemming or Lemma za on (reducing words to their root form)

 Normaliza on (like conver ng to lowercase)

15. Design an inverted index with an example. (Apr 22)

An inverted index maps terms to the list of documents (called pos ng list) where they appear.

Example:

Documents:

 D1: "Data science is powerful"

 D2: "Science is fun"

Inverted Index:

Term → Document List

-----------------------------

data → D1

science → D1, D2

is → D1, D2

powerful → D1

fun → D2

This structure allows fast lookup of documents that contain a specific word.

Let me know if you want these compiled or if you're moving to Unit 3 next!
Here are the 2-mark answers for Unit 3 – Informa on Retrieval Techniques, explained clearly and concisely for exam
prep:

1. List the evalua on metrics to assess the quality of single-label text classifiers

 Precision

 Recall

 F1-score

 Accuracy

 Confusion Matrix

2. Define precision and recall metrics in text classifica on

 Precision = Relevant documents retrieved / Total documents retrieved

 Recall = Relevant documents retrieved / Total relevant documents available

3. What is the space complexity of full inverted index?

The space complexity of a full inverted index is O(t × d),


where t = number of terms, d = number of documents.

4. Construct the i) inverted index and ii) occurrence as inverted list

Document collec on:

 d1: To do is to be. To be is to do

 d2: To be or not to be. I am what i am

 d3: I think therefore i am. Do be do be do

 d4: Do do do, da da da

i) Inverted Index:

Term → Documents

--------------------------

to → d1, d2

do → d1, d3, d4

is → d1

be → d1, d2, d3

or → d2
not → d2

i → d2, d3

am → d2, d3

what → d2

think → d3

therefore → d3

da → d4

ii) Occurrence as Inverted List:

Term → { (d1, 3), (d2, 2), (d3, 3) } ← for term "be"

(Shows term frequency in each document)

5. Define single-link, complete-link and average-link algorithm

 Single-link: Distance between two clusters = shortest distance between points.

 Complete-link: Distance = farthest distance between points.

 Average-link: Distance = average distance between all point pairs across clusters.

6. List the basic types of machine learning algorithms (based on learning mechanism)

 Supervised Learning

 Unsupervised Learning

 Semi-supervised Learning

 Reinforcement Learning

7. What do you mean by supervised learning?

Supervised learning is a machine learning method where the model is trained on labeled data, meaning both input and
correct output are provided.

8. Define Text Clustering

Text clustering is the process of grouping similar text documents into clusters based on their content without using
labels.

9. Dis nguish Supervised and Unsupervised Learning (Apr 22)


Supervised Learning Unsupervised Learning

Uses labeled data Uses unlabeled data

Example: Classifica on Example: Clustering

Output is known during training Output is unknown during training

10. Define the text classifica on problem

Text classifica on is the task of assigning a category label to a text document based on its content using machine
learning.

11. List the basic types of unsupervised algorithms

 K-means

 Hierarchical clustering

 DBSCAN

 PCA (for dimensionality reduc on)

12. List the supervised text classifica on algorithms

 Naive Bayes

 Decision Trees

 Support Vector Machines (SVM)

 Logis c Regression

 Random Forest

13. Write the Bisec ng k-means algorithm (Apr 24)

1. Start with all documents in one cluster

2. Repeat un l k clusters are formed:


a. Pick the largest cluster
b. Split it into two using K-means
c. Choose the split with the best score (e.g., SSE)
d. Add the two new clusters to the list

14. Explain K-means clustering – Batch Mode

In batch mode, all data points are considered at once:


 Assign points to nearest cluster

 Recalculate centroids

 Repeat un l convergence

15. Dis nguish classifica on and clustering

Classifica on Clustering

Supervised learning Unsupervised learning

Labeled data Unlabeled data

Predicts predefined classes Finds hidden groupings

16. Explain K-means clustering – Online Mode

In online mode, data points are processed one at a me:

 A data point is assigned to the nearest centroid

 The centroid is immediately updated based on this point


→ Suitable for streaming or large datasets

Let me know if you'd like these in a PDF or want to con nue with Unit 4!

Here are the 2-mark answers for Unit 4 – Informa on Retrieval Techniques, clear and to-the-point for your exam prep:

1. What do you mean by mirroring and web archiving?

 Mirroring: Crea ng an exact replica of a website on another server. Useful for backup and load distribu on.

 Web Archiving: Collec ng and storing web content over me for future access and research (e.g., Wayback
Machine).

2. Write down the techniques of SPAM (Apr 21)

 Keyword stuffing

 Cloaking (showing different content to users and crawlers)

 Link farms

 Hidden text

 Redirec on
3. What is a Web Crawler? (Apr 21)

A web crawler (also known as spider or bot) is a program that automa cally browses the web, retrieves web pages, and
adds them to the search engine index.

4. Compare parallel crawler and meta crawler (Apr 22)

Parallel Crawler Meta Crawler

Crawls many pages simultaneously Sends queries to mul ple search engines

Maintains its own index Does not maintain an index

5. Dis nguish between general web search and ver cal web search

 General Web Search: Searches across the whole web (e.g., Google)

 Ver cal Web Search: Focuses on specific domains like images, jobs, travel, etc. (e.g., Google Scholar)

6. Illustrate the hashing technique with example (Apr 22, Apr 23)

Hashing maps data to a fixed-size value.


Example:

 Document ID: 101

 Hash func on: h(x) = x % 10

 Hash(101) = 1 → Store document at posi on 1 in the hash table.

7. Why search engine might cluster concepts to improve search result (Apr 24)

Clustering groups related results to:

 Improve relevance

 Avoid duplicate results

 Organize large result sets


→ Helps users find informa on faster.

8. What is Link-based Ranking? (Apr 24)

Link-based ranking ranks pages based on incoming and outgoing links, assuming important pages are more likely to be
linked (e.g., PageRank algorithm).
9. What is web spam / spamdexing?

Spamdexing is the prac ce of manipula ng search engine rankings using decep ve techniques like keyword stuffing and
link spamming.

10. What is a spam page?

A spam page is a web page created to deceive search engines and a ract traffic without offering genuine or valuable
content to users.

11. Dis nguish between three levels of link analysis on the web

 Page Level: Analyzes individual pages

 Host Level: Groups pages by host/server

 Domain Level: Groups by en re domain (e.g., example.com)

12. What is the purpose of link analysis?

 To determine importance or authority of web pages

 To improve ranking in search engines

 To detect spam and link farms

13. Analyze about Decision Tree Algorithm with illustra on (Apr 22)

A decision tree splits data based on a ribute values to make decisions.

Example:
A ribute: Weather

 Sunny → Play

 Rainy → Don't Play


It forms a tree-like structure of condi ons to classify new data.

14. Define Classifier. Explain in detail about SVM Classifier (Apr 24)

 A classifier assigns labels to data based on input features.

 SVM (Support Vector Machine) finds the best boundary (hyperplane) that separates data classes with the
maximum margin.

15. State Bayes Theorem. Discuss Naïve Bayesian Classifier with example

Bayes Theorem:
P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

Naïve Bayes Classifier assumes features are independent and applies Bayes theorem for predic on.
Example:
If "Rain" and "Traffic" affect "Late to class", the classifier calculates probabili es for both and predicts the most likely
outcome.

Let me know if you’d like to move on to Unit 5, or if you'd like all units combined in a printable format (PDF/Word)!
Here are the 2-mark answers for Unit 5 – Informa on Retrieval Techniques, clearly explained for your exam prep:

1. What are major components of neighbourhood approach?

 User-based filtering: Finds users similar to the target user.

 Item-based filtering: Finds items similar to the target item.

 Similarity measure: Uses metrics like cosine similarity or Pearson correla on.

 Predic on func on: Predicts ra ng or preference.

2. What are recommender systems? (Apr 24)

Recommender systems are so ware tools that suggest relevant items (like products, movies, or books) to users based on
their preferences, behavior, or past ac vity.

3. What makes a Search Engine different from a Recommender System?

 Search Engine: Responds to explicit queries entered by users.

 Recommender System: Provides personalized sugges ons without an explicit query.

4. Men on the different ra ng specifica on in Recommenda on system (Apr 24)

 Explicit ra ng: User provides ra ngs (e.g., 5 stars).

 Implicit ra ng: Inferred from behavior (e.g., clicks, views).

 Binary ra ng: Yes/No or Like/Dislike.

5. Point out the advantages of content-based filtering (Apr 21)

 Works well with new users (cold start problem).

 Provides personalized recommenda ons.

 No need for user interac on history with others.

6. When can collabora ve filtering be used? (Apr 21)

Collabora ve filtering is used when:

 There is sufficient user interac on data (ra ngs, purchases).

 You want to recommend items based on similar user preferences.


7. Point out some advantages of Mobile Recommenda on System (Apr 22)

 Loca on-aware sugges ons

 Real- me updates

 Personalized experience based on mobile usage

 Convenience and accessibility

8. Examine the broad classifica on of Recommenda on Systems (Apr 22)

 Content-based filtering

 Collabora ve filtering

 Hybrid filtering

9. Define meta-level and cascade recommenda on system (Apr 23)

 Meta-level: Uses the output of one system (like a learned model) as input to another.

 Cascade: Priori zes one recommender and uses another to refine the results.

10. List the types of hybrid recommenda on systems

 Weighted Hybrid

 Switching Hybrid

 Mixed Hybrid

 Feature Combina on

 Cascade

 Meta-level

11. Explain the role of Matrix Factoriza on techniques in Collabora ve Filtering (Apr 24)

Matrix Factoriza on reduces the user-item ra ng matrix into latent features (hidden factors).
It improves predic on by capturing hidden pa erns between users and items, commonly used in systems like Ne lix.

12. Explain in detail about Neighbourhood models of Collabora ve Filtering (Apr 24)

 User-based model: Finds users similar to the target user and recommends items they liked.

 Item-based model: Finds similar items and recommends based on past user preferences.
Both rely on similarity measures like cosine similarity to compute closeness.

You might also like