0% found this document useful (0 votes)

4 views8 pages

LIBS 894 Assignment Three Classic Models

The document discusses three classic models in information retrieval: the Boolean model, the Vector Space Model, and the Probabilistic model. Each model is explained with core principles, advantages, limitations, and real-world applications, highlighting their relevance in various search systems. The document serves as an academic assignment submitted by a student in a library and information science course.

Uploaded by

Afolabi Qauzeem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views8 pages

LIBS 894 Assignment Three Classic Models

Uploaded by

Afolabi Qauzeem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

LIBS 894: INFORMATION RETRIEVAL

SYSTEM
Individual Assignment
Topic: Three Classic Models in
Information Retrieval System
Submitted by: Rebecca
Course Code: LIBS 894
Lecturer: Aminu Musa
Date: April 2025
Boolean Model

The Boolean model is one of the earliest and most fundamental models in information

retrieval (IR), developed based on the principles of Boolean algebra. In this model,

documents are represented as sets of terms, and queries are formulated using logical

operators: AND, OR, and NOT. It treats document relevance as binary—either a

document matches a query (relevant) or it does not (non-relevant) (Baeza-Yates &

Ribeiro-Neto, 2011).

Core Principles:

- AND: A document must contain all the specified terms.

- OR: A document must contain at least one of the specified terms.

- NOT: A document must not contain the specified term.

Example:

Assume a digital library contains documents on different topics. A user searches for

“education AND technology”. The Boolean model will retrieve only documents that

contain both terms, excluding any that contain only one. If the query was “education OR

technology”, it would retrieve all documents that contain either or both terms. If the user

writes “education AND NOT technology”, it returns documents that contain the word

"education" but not "technology."

Advantages:

- Simplicity: The model is easy to understand and implement.

- Efficiency: Suitable for databases with structured data and well-defined vocabularies.
- Exact matching: Allows users to precisely control the search scope using logical

operators.

Limitations:

- Lack of ranking: All matching documents are treated equally; there’s no notion of

relevance scoring.

- Rigid matching: If a document uses synonyms or alternate phrasing, it might be missed

unless explicitly included.

- User complexity: Users must understand how to construct Boolean queries effectively,

which may be challenging for novices.

Real-World Application:

Boolean retrieval is still widely used in legal databases (e.g., LexisNexis), library catalog

systems, and search interfaces in professional databases like PubMed or Scopus, where

precision and exact filtering are important (Croft, Metzler, & Strohman, 2015).

Vector Space Model

The Vector Space Model (VSM) represents documents and queries as vectors in a multi-

dimensional space where each dimension corresponds to a distinct term. Unlike the

Boolean model, it provides a graded notion of relevance by measuring the cosine

similarity between the query vector and each document vector (Salton, Wong, & Yang,

1975).

Core Principles:

- Each document and query is represented as a vector of term weights.

- Term weighting schemes such as TF-IDF (Term Frequency-Inverse Document

Frequency) are used to reflect the importance of a term within a document and across the

collection.

- Cosine similarity is calculated as:

cosine(θ) = (D · Q) / (||D|| × ||Q||)

Example:

Suppose a user enters the query "e-learning platform". Two documents are:

- D1: "E-learning has transformed education using online platforms."

- D2: "Healthcare platforms improve patient outcomes."

After preprocessing (removal of stopwords, stemming, etc.), term frequencies are

computed, and the cosine similarity is calculated between the query vector and each

document vector. D1 would likely score higher due to higher semantic overlap with the

query terms.

Advantages:

- Relevance ranking: Documents are ranked based on similarity scores, improving

retrieval quality.

- Partial matching: Even if a document does not contain all query terms, it can still be

retrieved based on similarity.

- Scalability: Effective for large-scale systems and adaptable to machine learning

enhancements.

Limitations:

- High dimensionality: Representing each term as a dimension can result in large, sparse
vectors.

- Term independence assumption: It ignores the relationships or dependencies between

terms.

- Semantic gaps: Synonyms or related concepts may not be captured unless additional

processing (e.g., Latent Semantic Indexing) is used.

Real-World Application:

The vector space model is foundational in modern search engines like Google and Bing

and is widely used in text mining, document classification, and recommender systems

(Manning, Raghavan, & Schütze, 2008).

Probabilistic Model

The Probabilistic model of information retrieval assumes that given a user query, there

exists a probability that a document is relevant. Documents are ranked according to this

estimated probability of relevance. This approach views information retrieval as a

problem of inference under uncertainty (Robertson & Sparck Jones, 1976).

Core Principles:

- Each document has a probability of being relevant to a given query.

- The system estimates this probability using term frequency and document statistics.

- The most common implementation is the Binary Independence Model (BIM).

The probability that a document is relevant (R) given a query (Q) is estimated as:

P(R|D, Q) ∝ ∏ (P(t|R) / P(t|¬R)), for each term t in Q

Where:

- P(t|R): Probability term t appears in relevant documents.

- P(t|¬R): Probability term t appears in non-relevant documents.

Example:

If a user searches for “remote work policies,” the system examines the frequency of these

terms in previously judged relevant vs. non-relevant documents and calculates the

likelihood that new documents containing similar patterns are relevant.

Advantages:

- Theoretical foundation: Based on Bayesian probability theory.

- Relevance feedback: Improves retrieval through user feedback on document relevance.

- Ranking by likelihood: More intuitive ranking of documents by probability of

relevance.

Limitations:

- Initial assumption: Requires estimation of relevance probabilities, which may not be

available at first.

- Independence assumption: Assumes terms are conditionally independent, which is not

always realistic.

- Complexity: More computationally intensive than Boolean or vector space models.

Real-World Application:

Probabilistic models underpin modern ranking systems in web search engines and are

foundational in algorithms like BM25 used in Elasticsearch, Solr, and other full-text

search libraries (Robertson & Zaragoza, 2009).

References

Baeza-Yates, R., & Ribeiro-Neto, B. (2011). *Modern information retrieval: The

concepts and technology behind search* (2nd ed.). Addison-Wesley.

Croft, W. B., Metzler, D., & Strohman, T. (2015). *Search engines: Information retrieval

in practice* (2nd ed.). Pearson.

Manning, C. D., Raghavan, P., & Schütze, H. (2008). *Introduction to information

retrieval*. Cambridge University Press.

Robertson, S. E., & Sparck Jones, K. (1976). Relevance weighting of search terms.

Journal of the American Society for Information Science, 27(3), 129–146.

https://doi.org/10.1002/asi.4630270302

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and

beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.

https://doi.org/10.1561/1500000019

Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic

indexing. Communications of the ACM, 18(11), 613–620.

https://doi.org/10.1145/361219.361220

CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
IRS Automatic Indexing UNIT-2
67% (3)
IRS Automatic Indexing UNIT-2
18 pages
Grade 10 Data Handling QP 2024
No ratings yet
Grade 10 Data Handling QP 2024
6 pages
My LARGE-PHONICS-FLASHCARDS-set-1
No ratings yet
My LARGE-PHONICS-FLASHCARDS-set-1
26 pages
Evaluation of Value For Money Audit As A Tool For Fraud Control in The Public Sector
No ratings yet
Evaluation of Value For Money Audit As A Tool For Fraud Control in The Public Sector
99 pages
IRS III Year UNIT-3 Part 1
50% (2)
IRS III Year UNIT-3 Part 1
18 pages
Claim Divine Your Dinner A Cookbook For Using Tarot As Your Guide To Magickal Meals Premium Ebook Download
No ratings yet
Claim Divine Your Dinner A Cookbook For Using Tarot As Your Guide To Magickal Meals Premium Ebook Download
16 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
Mrs Mba's Dissertation For Internal
No ratings yet
Mrs Mba's Dissertation For Internal
34 pages
USAA - Document-Statement Period - 10192024 To 11202024-1
No ratings yet
USAA - Document-Statement Period - 10192024 To 11202024-1
4 pages
Information Retrieval Notes
No ratings yet
Information Retrieval Notes
42 pages
4 IRModels
No ratings yet
4 IRModels
46 pages
Wedding Dance
100% (4)
Wedding Dance
2 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
Kirloskar FerrorsI Ndustries LTD (Project)
100% (6)
Kirloskar FerrorsI Ndustries LTD (Project)
53 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
Module 2-Students
No ratings yet
Module 2-Students
143 pages
Bulu
No ratings yet
Bulu
47 pages
Information Retrieval: Adt-V Unit
No ratings yet
Information Retrieval: Adt-V Unit
106 pages
ISE Information Retrieval Mod-V
No ratings yet
ISE Information Retrieval Mod-V
48 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
33 pages
Quiz Analyzing Heritage
No ratings yet
Quiz Analyzing Heritage
2 pages
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
No ratings yet
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
48 pages
Unit II
No ratings yet
Unit II
73 pages
E Commerce Module 5
No ratings yet
E Commerce Module 5
24 pages
An Exegetical Study of Song of Songs 4
No ratings yet
An Exegetical Study of Song of Songs 4
12 pages
IR Models
No ratings yet
IR Models
65 pages
Journal Er
No ratings yet
Journal Er
2 pages
WWW - RLN-: Japan Business Culture Report
No ratings yet
WWW - RLN-: Japan Business Culture Report
15 pages
Vector Space Model
No ratings yet
Vector Space Model
11 pages
Unit 4
No ratings yet
Unit 4
17 pages
IRS 2nd Chap
No ratings yet
IRS 2nd Chap
42 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
4 pages
Puerto Rico v. Sánchez Valle, 579 U.S. - (2016)
No ratings yet
Puerto Rico v. Sánchez Valle, 579 U.S. - (2016)
38 pages
Week 3 - Probabilistic Retrieval and Relevance Feedback
No ratings yet
Week 3 - Probabilistic Retrieval and Relevance Feedback
37 pages
Unit 2
No ratings yet
Unit 2
58 pages
IR Unit IV
No ratings yet
IR Unit IV
20 pages
Chapter 2: Modeling: Advanced Topics in Information Retrieval
No ratings yet
Chapter 2: Modeling: Advanced Topics in Information Retrieval
28 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
UNDP RBA - IMpact of The War in Ukraine On Africa - 24 May 2022 - 0
No ratings yet
UNDP RBA - IMpact of The War in Ukraine On Africa - 24 May 2022 - 0
26 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
Information Retrieval - 1
No ratings yet
Information Retrieval - 1
47 pages
All India Mock Test - 02
No ratings yet
All India Mock Test - 02
16 pages
Ir Mod2 Notes
No ratings yet
Ir Mod2 Notes
26 pages
Web Search
No ratings yet
Web Search
30 pages
LARGE-PHONICS-FLASHCARDS-set-1 2
No ratings yet
LARGE-PHONICS-FLASHCARDS-set-1 2
26 pages
Anassessment of Value 98-119
No ratings yet
Anassessment of Value 98-119
23 pages
Cameron Browne - Taiji Variations: Yin and Yang in Multiple Dimensions
No ratings yet
Cameron Browne - Taiji Variations: Yin and Yang in Multiple Dimensions
17 pages
NLP See
No ratings yet
NLP See
27 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
62 pages
Temi's Chapters Four and Five Corrected
No ratings yet
Temi's Chapters Four and Five Corrected
19 pages
4 IRModels
No ratings yet
4 IRModels
32 pages
5 IRModels IR
No ratings yet
5 IRModels IR
25 pages
Unit 2
No ratings yet
Unit 2
13 pages
Shamsudeen's PPT For Internal
No ratings yet
Shamsudeen's PPT For Internal
17 pages
Nursery 1 and 2 Second Term Exmination
No ratings yet
Nursery 1 and 2 Second Term Exmination
16 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
15 pages
Irt Q&A
No ratings yet
Irt Q&A
14 pages
Ourlog 5343
No ratings yet
Ourlog 5343
10 pages
Unit-5 Adt
No ratings yet
Unit-5 Adt
11 pages
Unit Ii Part B 1. Write About Basic IR Model
No ratings yet
Unit Ii Part B 1. Write About Basic IR Model
17 pages
Occupational Interest Quiz - Self Assessment
No ratings yet
Occupational Interest Quiz - Self Assessment
11 pages
Chapter 4 IR Models
No ratings yet
Chapter 4 IR Models
43 pages
Impact of The Privatization of Savannah Sugar Company
No ratings yet
Impact of The Privatization of Savannah Sugar Company
15 pages
RAMATU HALILU's STRATEGIC LEADERSHIP 2 IBB Updated
No ratings yet
RAMATU HALILU's STRATEGIC LEADERSHIP 2 IBB Updated
12 pages
10 1108 - Sef 01 2021 0010
No ratings yet
10 1108 - Sef 01 2021 0010
15 pages
An Evaluation of The Effects of Foreign Aids On The Socio-Economic Development in Nigeria, 2019 To 2025
No ratings yet
An Evaluation of The Effects of Foreign Aids On The Socio-Economic Development in Nigeria, 2019 To 2025
9 pages
Bhutan Spirit Sanctuary Prices For Walk in Guests Wellness 2024 Final
No ratings yet
Bhutan Spirit Sanctuary Prices For Walk in Guests Wellness 2024 Final
8 pages
Chapter Five IR Models
No ratings yet
Chapter Five IR Models
28 pages
MIAD801 Done
No ratings yet
MIAD801 Done
14 pages
Corporate Social Responsibility in India
No ratings yet
Corporate Social Responsibility in India
17 pages
Tsa Js 1 English Studies Examination 3rd Term
No ratings yet
Tsa Js 1 English Studies Examination 3rd Term
8 pages
MS COPY New
No ratings yet
MS COPY New
11 pages
IR Chapter 4
No ratings yet
IR Chapter 4
15 pages
Information Retrieval System-Chapter-1
No ratings yet
Information Retrieval System-Chapter-1
23 pages
Information Retrieval Practical
No ratings yet
Information Retrieval Practical
10 pages
Ghali's Terrorism and FP
No ratings yet
Ghali's Terrorism and FP
9 pages
Information Retrieval 1
No ratings yet
Information Retrieval 1
10 pages
Nuhu's Video Review
No ratings yet
Nuhu's Video Review
7 pages
IRS Unit 4 by Krishna
No ratings yet
IRS Unit 4 by Krishna
23 pages
NLP See
No ratings yet
NLP See
9 pages
Exp22 Excel Ch11 ML1 Internships Instructions
No ratings yet
Exp22 Excel Ch11 ML1 Internships Instructions
3 pages
Lynette's SCM5005
No ratings yet
Lynette's SCM5005
5 pages
Transplantation of The Liver 3rd Edition Entire Volume Download
100% (17)
Transplantation of The Liver 3rd Edition Entire Volume Download
16 pages
Recommended Chart
No ratings yet
Recommended Chart
2 pages
Retrieval Models and Rank Retrieval
No ratings yet
Retrieval Models and Rank Retrieval
16 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
Lynette's Assignment Self Assessment Done
No ratings yet
Lynette's Assignment Self Assessment Done
4 pages
Lynette's Assignment Self Assessment Done
No ratings yet
Lynette's Assignment Self Assessment Done
4 pages
The Russo Ukrainian Wars Impact Africa and A Neo Cold War9
No ratings yet
The Russo Ukrainian Wars Impact Africa and A Neo Cold War9
4 pages
IR Unit II
No ratings yet
IR Unit II
4 pages
Taawun Kindergarten Exam
No ratings yet
Taawun Kindergarten Exam
3 pages
Fiqh C3 Answers
No ratings yet
Fiqh C3 Answers
2 pages
Academic Calender
No ratings yet
Academic Calender
2 pages
Web Information Retrieval
No ratings yet
Web Information Retrieval
10 pages
Ftu 2.4.7
No ratings yet
Ftu 2.4.7
5 pages
INTRO Flexible Accounting
No ratings yet
INTRO Flexible Accounting
1 page
Receipt 75176502924214705753
No ratings yet
Receipt 75176502924214705753
1 page
Challenges and Coping Mechanisms of Teachers Teaching English Online: Basis For Action Plan
No ratings yet
Challenges and Coping Mechanisms of Teachers Teaching English Online: Basis For Action Plan
16 pages
Breezes Color by Number
No ratings yet
Breezes Color by Number
1 page
Nervous System
No ratings yet
Nervous System
1 page
Completed Unit II 17.7.17
No ratings yet
Completed Unit II 17.7.17
113 pages
Biochem-Experiment 2-Carbohydrates
No ratings yet
Biochem-Experiment 2-Carbohydrates
6 pages
DT Vs NDT
No ratings yet
DT Vs NDT
2 pages
P17F06.036.01 - Occlutech Patient Brochure PDA PDF
No ratings yet
P17F06.036.01 - Occlutech Patient Brochure PDA PDF
2 pages
Essential Oils: June 2019
No ratings yet
Essential Oils: June 2019
14 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
33 Best Sites To Buy Verified WebMoney Account
No ratings yet
33 Best Sites To Buy Verified WebMoney Account
10 pages
Answer Sheet - Phil Hist 2018
No ratings yet
Answer Sheet - Phil Hist 2018
1 page
In Re Garcia
No ratings yet
In Re Garcia
1 page
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
The Evolution of Tech in The Classroom
No ratings yet
The Evolution of Tech in The Classroom
17 pages
1 Google Sent Out A Press Release About The Gmail
No ratings yet
1 Google Sent Out A Press Release About The Gmail
1 page
IR Systems Usually Adopt Index Terms To Process Queries Index Term
No ratings yet
IR Systems Usually Adopt Index Terms To Process Queries Index Term
24 pages
Information Sheet
No ratings yet
Information Sheet
3 pages
cs419-519 Slides Part 2
No ratings yet
cs419-519 Slides Part 2
6 pages
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

LIBS 894 Assignment Three Classic Models

Uploaded by

LIBS 894 Assignment Three Classic Models

Uploaded by

LIBS 894: INFORMATION RETRIEVAL

operators: AND, OR, and NOT. It treats document relevance as binary—either a

document matches a query (relevant) or it does not (non-relevant) (Baeza-Yates &

- AND: A document must contain all the specified terms.

- OR: A document must contain at least one of the specified terms.

- NOT: A document must not contain the specified term.

"education" but not "technology."

- Simplicity: The model is easy to understand and implement.

- Rigid matching: If a document uses synonyms or alternate phrasing, it might be missed

unless explicitly included.

which may be challenging for novices.

Vector Space Model

Boolean model, it provides a graded notion of relevance by measuring the cosine

- Each document and query is represented as a vector of term weights.

- Cosine similarity is calculated as:

cosine(θ) = (D · Q) / (||D|| × ||Q||)

- D1: "E-learning has transformed education using online platforms."

- D2: "Healthcare platforms improve patient outcomes."

After preprocessing (removal of stopwords, stemming, etc.), term frequencies are

- Relevance ranking: Documents are ranked based on similarity scores, improving

retrieved based on similarity.

- Scalability: Effective for large-scale systems and adaptable to machine learning

- Term independence assumption: It ignores the relationships or dependencies between

processing (e.g., Latent Semantic Indexing) is used.

(Manning, Raghavan, & Schütze, 2008).

estimated probability of relevance. This approach views information retrieval as a

problem of inference under uncertainty (Robertson & Sparck Jones, 1976).

- Each document has a probability of being relevant to a given query.

- The most common implementation is the Binary Independence Model (BIM).

P(R|D, Q) ∝ ∏ (P(t|R) / P(t|¬R)), for each term t in Q

- P(t|R): Probability term t appears in relevant documents.

- P(t|¬R): Probability term t appears in non-relevant documents.

likelihood that new documents containing similar patterns are relevant.

- Theoretical foundation: Based on Bayesian probability theory.

- Relevance feedback: Improves retrieval through user feedback on document relevance.

- Ranking by likelihood: More intuitive ranking of documents by probability of

- Initial assumption: Requires estimation of relevance probabilities, which may not be

- Independence assumption: Assumes terms are conditionally independent, which is not

- Complexity: More computationally intensive than Boolean or vector space models.

search libraries (Robertson & Zaragoza, 2009).

Baeza-Yates, R., & Ribeiro-Neto, B. (2011). *Modern information retrieval: The

concepts and technology behind search* (2nd ed.). Addison-Wesley.

in practice* (2nd ed.). Pearson.

Manning, C. D., Raghavan, P., & Schütze, H. (2008). *Introduction to information

retrieval*. Cambridge University Press.

*Journal of the American Society for Information Science*, 27(3), 129–146.

beyond. *Foundations and Trends in Information Retrieval*, 3(4), 333–389.

indexing. *Communications of the ACM*, 18(11), 613–620.

You might also like

Journal of the American Society for Information Science, 27(3), 129–146.

beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.

indexing. Communications of the ACM, 18(11), 613–620.