All Unit 2 Mark
All Unit 2 Mark
Here are clear and concise 2-mark answers from Unit 1, just the way you
asked:
Focuses on retrieving relevant documents from a large collec on based on user queries.
Web Search:
Involves crawling, indexing web pages, and ranking using algorithms (like Google).
Metasearch engines (e.g., Dogpile – fetch results from mul ple engines)
5. What are the major components of tradi onal IRT? (Apr 24)
Ambiguity in queries
Enhances naviga on and retrieval by providing mul ple paths to related content.
The e-Publishing era refers to the me when content is created, distributed, and accessed digitally rather than in print.
It enables fast, wide, and cost-effec ve distribu on of books, journals, and documents.
Inves ga on in IR refers to the process of exploring, searching, and analyzing data or documents to find useful or
relevant informa on.
It helps users dig deeper into the data for insights or answers to queries.
Sensemaking is the process of interpre ng and understanding informa on to make informed decisions.
In IR, it helps users understand search results and connect informa on meaningfully.
Let me know if you want these compiled in a PDF or want answers for Unit 2 next!
Here are the Unit 2 – Informa on Retrieval Technique 2-mark answers in a clear and simple way for your exam prep:
Unigram Model: Assumes each word in a document is independent. Probability of a sentence is the product of
probabili es of individual words.
Bigram Model: Considers the probability of a word based on the previous word. It captures some word
dependencies.
LSI is a technique that reduces the dimensionality of the term-document matrix using Singular Value Decomposi on
(SVD) to capture hidden rela onships between terms and documents.
It helps overcome synonymy and polysemy in retrieval.
A zone index divides a document into mul ple zones (like tle, abstract, body) and assigns weights to terms based on the
zone.
Terms in important zones like the tle may be given higher importance.
Relevance feedback is a technique where the system improves search results based on user feedback.
If a user marks results as relevant or irrelevant, the system refines future queries for be er accuracy.
Inverse Document Frequency (IDF): Measures how rare a term is across all documents.
→ TF shows local importance, IDF shows global uniqueness.
Query Expansion is the process of enhancing a user query by adding related or synonymous terms.
It helps retrieve more relevant results by reducing vocabulary mismatch between the query and documents.
(You can draw a 3-layer diagram with arrows showing input → hidden → output)
8. Define Term Weigh ng and list its two factors. (Apr 23, Apr 24)
9. Define TF-IDF.
TF-IDF (Term Frequency – Inverse Document Frequency) is a term weigh ng scheme that gives high scores to terms that
occur frequently in a document but rarely in the en re collec on.
IDF reduces the weight of common terms that appear in many documents.
TF-IDF = TF × IDF
→ Helps iden fy important and unique terms in documents.
Inversion is the process of crea ng an inverted index, where for each term, a list of documents in which it appears is
stored.
It allows fast retrieval of documents matching a query term.
An Informa on Retrieval Model is a framework that defines how documents are represented, matched, and ranked
based on a query.
Examples: Boolean Model, Vector Space Model, Probabilis c Model.
Here are the 2-mark answers for the ques ons you asked from Unit 2 – IRT:
Document preprocessing is the process of cleaning and preparing text data before indexing or retrieval.
It includes steps like:
An inverted index maps terms to the list of documents (called pos ng list) where they appear.
Example:
Documents:
Inverted Index:
-----------------------------
data → D1
science → D1, D2
is → D1, D2
powerful → D1
fun → D2
This structure allows fast lookup of documents that contain a specific word.
Let me know if you want these compiled or if you're moving to Unit 3 next!
Here are the 2-mark answers for Unit 3 – Informa on Retrieval Techniques, explained clearly and concisely for exam
prep:
1. List the evalua on metrics to assess the quality of single-label text classifiers
Precision
Recall
F1-score
Accuracy
Confusion Matrix
d1: To do is to be. To be is to do
d4: Do do do, da da da
i) Inverted Index:
Term → Documents
--------------------------
to → d1, d2
do → d1, d3, d4
is → d1
be → d1, d2, d3
or → d2
not → d2
i → d2, d3
am → d2, d3
what → d2
think → d3
therefore → d3
da → d4
Average-link: Distance = average distance between all point pairs across clusters.
6. List the basic types of machine learning algorithms (based on learning mechanism)
Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
Supervised learning is a machine learning method where the model is trained on labeled data, meaning both input and
correct output are provided.
Text clustering is the process of grouping similar text documents into clusters based on their content without using
labels.
Text classifica on is the task of assigning a category label to a text document based on its content using machine
learning.
K-means
Hierarchical clustering
DBSCAN
Naive Bayes
Decision Trees
Logis c Regression
Random Forest
Recalculate centroids
Repeat un l convergence
Classifica on Clustering
Let me know if you'd like these in a PDF or want to con nue with Unit 4!
Here are the 2-mark answers for Unit 4 – Informa on Retrieval Techniques, clear and to-the-point for your exam prep:
Mirroring: Crea ng an exact replica of a website on another server. Useful for backup and load distribu on.
Web Archiving: Collec ng and storing web content over me for future access and research (e.g., Wayback
Machine).
Keyword stuffing
Link farms
Hidden text
Redirec on
3. What is a Web Crawler? (Apr 21)
A web crawler (also known as spider or bot) is a program that automa cally browses the web, retrieves web pages, and
adds them to the search engine index.
Crawls many pages simultaneously Sends queries to mul ple search engines
5. Dis nguish between general web search and ver cal web search
General Web Search: Searches across the whole web (e.g., Google)
Ver cal Web Search: Focuses on specific domains like images, jobs, travel, etc. (e.g., Google Scholar)
6. Illustrate the hashing technique with example (Apr 22, Apr 23)
7. Why search engine might cluster concepts to improve search result (Apr 24)
Improve relevance
Link-based ranking ranks pages based on incoming and outgoing links, assuming important pages are more likely to be
linked (e.g., PageRank algorithm).
9. What is web spam / spamdexing?
Spamdexing is the prac ce of manipula ng search engine rankings using decep ve techniques like keyword stuffing and
link spamming.
A spam page is a web page created to deceive search engines and a ract traffic without offering genuine or valuable
content to users.
11. Dis nguish between three levels of link analysis on the web
13. Analyze about Decision Tree Algorithm with illustra on (Apr 22)
Example:
A ribute: Weather
Sunny → Play
14. Define Classifier. Explain in detail about SVM Classifier (Apr 24)
SVM (Support Vector Machine) finds the best boundary (hyperplane) that separates data classes with the
maximum margin.
15. State Bayes Theorem. Discuss Naïve Bayesian Classifier with example
Bayes Theorem:
P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
Naïve Bayes Classifier assumes features are independent and applies Bayes theorem for predic on.
Example:
If "Rain" and "Traffic" affect "Late to class", the classifier calculates probabili es for both and predicts the most likely
outcome.
Let me know if you’d like to move on to Unit 5, or if you'd like all units combined in a printable format (PDF/Word)!
Here are the 2-mark answers for Unit 5 – Informa on Retrieval Techniques, clearly explained for your exam prep:
Similarity measure: Uses metrics like cosine similarity or Pearson correla on.
Recommender systems are so ware tools that suggest relevant items (like products, movies, or books) to users based on
their preferences, behavior, or past ac vity.
Real- me updates
Content-based filtering
Collabora ve filtering
Hybrid filtering
Meta-level: Uses the output of one system (like a learned model) as input to another.
Cascade: Priori zes one recommender and uses another to refine the results.
Weighted Hybrid
Switching Hybrid
Mixed Hybrid
Feature Combina on
Cascade
Meta-level
11. Explain the role of Matrix Factoriza on techniques in Collabora ve Filtering (Apr 24)
Matrix Factoriza on reduces the user-item ra ng matrix into latent features (hidden factors).
It improves predic on by capturing hidden pa erns between users and items, commonly used in systems like Ne lix.
12. Explain in detail about Neighbourhood models of Collabora ve Filtering (Apr 24)
User-based model: Finds users similar to the target user and recommends items they liked.
Item-based model: Finds similar items and recommends based on past user preferences.
Both rely on similarity measures like cosine similarity to compute closeness.