LIBS 894 Assignment Three Classic Models
LIBS 894 Assignment Three Classic Models
SYSTEM
Individual Assignment
Topic: Three Classic Models in
Information Retrieval System
Submitted by: Rebecca
Course Code: LIBS 894
Lecturer: Aminu Musa
Date: April 2025
Boolean Model
The Boolean model is one of the earliest and most fundamental models in information
retrieval (IR), developed based on the principles of Boolean algebra. In this model,
documents are represented as sets of terms, and queries are formulated using logical
Ribeiro-Neto, 2011).
Core Principles:
Example:
Assume a digital library contains documents on different topics. A user searches for
“education AND technology”. The Boolean model will retrieve only documents that
contain both terms, excluding any that contain only one. If the query was “education OR
technology”, it would retrieve all documents that contain either or both terms. If the user
writes “education AND NOT technology”, it returns documents that contain the word
Advantages:
- Efficiency: Suitable for databases with structured data and well-defined vocabularies.
- Exact matching: Allows users to precisely control the search scope using logical
operators.
Limitations:
- Lack of ranking: All matching documents are treated equally; there’s no notion of
relevance scoring.
- User complexity: Users must understand how to construct Boolean queries effectively,
Real-World Application:
Boolean retrieval is still widely used in legal databases (e.g., LexisNexis), library catalog
systems, and search interfaces in professional databases like PubMed or Scopus, where
precision and exact filtering are important (Croft, Metzler, & Strohman, 2015).
The Vector Space Model (VSM) represents documents and queries as vectors in a multi-
dimensional space where each dimension corresponds to a distinct term. Unlike the
similarity between the query vector and each document vector (Salton, Wong, & Yang,
1975).
Core Principles:
Frequency) are used to reflect the importance of a term within a document and across the
collection.
Example:
Suppose a user enters the query "e-learning platform". Two documents are:
computed, and the cosine similarity is calculated between the query vector and each
document vector. D1 would likely score higher due to higher semantic overlap with the
query terms.
Advantages:
retrieval quality.
- Partial matching: Even if a document does not contain all query terms, it can still be
enhancements.
Limitations:
- High dimensionality: Representing each term as a dimension can result in large, sparse
vectors.
terms.
- Semantic gaps: Synonyms or related concepts may not be captured unless additional
Real-World Application:
The vector space model is foundational in modern search engines like Google and Bing
and is widely used in text mining, document classification, and recommender systems
Probabilistic Model
The Probabilistic model of information retrieval assumes that given a user query, there
exists a probability that a document is relevant. Documents are ranked according to this
Core Principles:
- The system estimates this probability using term frequency and document statistics.
The probability that a document is relevant (R) given a query (Q) is estimated as:
Example:
If a user searches for “remote work policies,” the system examines the frequency of these
terms in previously judged relevant vs. non-relevant documents and calculates the
Advantages:
relevance.
Limitations:
available at first.
always realistic.
Real-World Application:
Probabilistic models underpin modern ranking systems in web search engines and are
foundational in algorithms like BM25 used in Elasticsearch, Solr, and other full-text
Croft, W. B., Metzler, D., & Strohman, T. (2015). *Search engines: Information retrieval
Robertson, S. E., & Sparck Jones, K. (1976). Relevance weighting of search terms.
https://doi.org/10.1002/asi.4630270302
Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and
https://doi.org/10.1561/1500000019
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic
https://doi.org/10.1145/361219.361220