Vector Space Model

Vector space model in the information retrieval subject

Uploaded by

Harsha Vardhan sai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Vector Space Model

Vector space model in the information retrieval subject

Uploaded by

Harsha Vardhan sai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Vector Space Model:

we consider vector space model based on the bag-of-words representation. Documents and queries are represented as vectors.

Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero.
Several different ways of computing these values, also known as (term) weights, have been developed. One of the best
known schemes is tf-idf weighting.
The definition of term depends on the application. Typically terms are single words, keywords, or longer phrases. If words
are chosen to be the terms, the dimensionality of the vector is the number of words in the vocabulary (the number of
distinct words occurring in the corpus).

Vector operations can be used to compare documents with queries.[

The Vector Space Model represents documents and terms as vectors in a multi-dimensional space. Each dimension
corresponds to a unique term in the entire corpus of documents. Each dimension corresponds to a unique term, while
the documents and queries can be represented as vectors within that space.
1. Vector Representation: We represent documents and queries as vectors using techniques like TF-IDF. Each
document in the corpus and the query are converted into vectors in the same high-dimensional space.
2. Cosine Similarity Calculation: To determine the relevance of a document to a query, we calculate the cosine
similarity between the query vector and the vectors representing each document in the corpus.
3. Ranking: Documents with higher cosine similarity scores to the query are considered more relevant and are ranked
higher. Those with lower scores are ranked lower.

The key idea behind cosine similarity is to calculate the cosine of the angle between two vectors. If the vectors are

very similar, their angle will be small, and the cosine value will be close to 1. Conversely, if the vectors are dissimilar,

the angle will be large, and the cosine value will approach 0.

How is Cosine Similarity Calculated?

The formula for calculating cosine similarity between two vectors A and B is as follows:

Where:

∥A∥ and ∥B∥ represent the Euclidean norms (magnitudes) of vectors A and B, respectively.
A⋅B represents the dot product of vectors A and B.


The cosine similarity value ranges from -1 (completely dissimilar) to 1 (completely similar). A higher cosine similarity

score indicates greater similarity between the two vectors.

Why Cosine Similarity?

Cosine similarity has several advantages when applied to text data:

1. Scale Invariance: Cosine similarity is scale-invariant, meaning it’s not affected by the magnitude of the vectors. This
makes it suitable for documents of different lengths.
2. Angle Measure: It focuses on the direction of vectors rather than their absolute values, which is crucial for text
similarity, where document length can vary.
3. Efficiency: Calculating cosine similarity is computationally efficient, making it suitable for large-scale text datasets.

The vector space model has the following advantages over the Standard Boolean model:

1. Allows ranking documents according to their possible relevance

2. Allows retrieving documents with partial matching.

The vector space model has the following limitations:

1. Query terms are assumed to be independent, so phrases might not be represented well in the ranking
2. Semantic sensitivity; documents with similar context but different term vocabulary won't be associated [2]
Example:

Document 1: Cat runs behind rat

Document 2: Dog runs behind cat
Query: rat

1. Document vectors representation:

Document 1: (cat, runs, behind, rat)

Document 2: (Dog, runs, behind, cat)
Query: (rat)

2. Term-Document Matrix:

4.
Tf-idf weightage is calculated using tf X idf

Maths Activity File
86% (49)
Maths Activity File
4 pages
L14 VSM
No ratings yet
L14 VSM
24 pages
Chapter 4- Part II
No ratings yet
Chapter 4- Part II
44 pages
Vector Space Model
No ratings yet
Vector Space Model
11 pages
CS 3308 Learning Journal 4
No ratings yet
CS 3308 Learning Journal 4
3 pages
IR-Lab Manual A1
No ratings yet
IR-Lab Manual A1
3 pages
Vector Space Model
No ratings yet
Vector Space Model
7 pages
Cosine Similarity
No ratings yet
Cosine Similarity
5 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
L04
No ratings yet
L04
35 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
ISR chap...5
No ratings yet
ISR chap...5
34 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
CIKM2022_submission_3961
No ratings yet
CIKM2022_submission_3961
5 pages
06 VectorSpaceModel
No ratings yet
06 VectorSpaceModel
65 pages
Text
No ratings yet
Text
11 pages
06 VectorSpaceModel PDF
No ratings yet
06 VectorSpaceModel PDF
75 pages
Text Similarity Cosine BOW TF-IDF Lecture
No ratings yet
Text Similarity Cosine BOW TF-IDF Lecture
6 pages
CS 3308 Learning Journal Unit 4
No ratings yet
CS 3308 Learning Journal Unit 4
5 pages
Vector Space Model
No ratings yet
Vector Space Model
6 pages
AI6122 Topic 3.2 - Ranking
No ratings yet
AI6122 Topic 3.2 - Ranking
27 pages
IR - ch5 - Vector Space Model
No ratings yet
IR - ch5 - Vector Space Model
23 pages
Term Weighting & The Vector Space Model
No ratings yet
Term Weighting & The Vector Space Model
2 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
ShortCourse-QTT-Lecture1
No ratings yet
ShortCourse-QTT-Lecture1
40 pages
Cosine Similarity_
No ratings yet
Cosine Similarity_
3 pages
alshammari-2023-ijca-922667
No ratings yet
alshammari-2023-ijca-922667
4 pages
PythonAI-VectorEmbeddingsForSharing
No ratings yet
PythonAI-VectorEmbeddingsForSharing
46 pages
Reference Material For NLP - 1
No ratings yet
Reference Material For NLP - 1
40 pages
Chapter 8 - Collaborative_Filtering
No ratings yet
Chapter 8 - Collaborative_Filtering
118 pages
What Is Cosine Similarity and Why Is It Advantageous?
No ratings yet
What Is Cosine Similarity and Why Is It Advantageous?
2 pages
Documents Similarity
No ratings yet
Documents Similarity
6 pages
(Jaffar) IR - Modeling - III
No ratings yet
(Jaffar) IR - Modeling - III
32 pages
Learning Guide Unit 4 _ Home
No ratings yet
Learning Guide Unit 4 _ Home
10 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
Lecture 11 Collaborative Filtering
No ratings yet
Lecture 11 Collaborative Filtering
37 pages
LLM✅✅
No ratings yet
LLM✅✅
8 pages
Vector Space Model: TF - IDF: Adapted From Lectures by
No ratings yet
Vector Space Model: TF - IDF: Adapted From Lectures by
37 pages
11 Text Categorization
No ratings yet
11 Text Categorization
25 pages
Learning Guide Unit 4 _ Home
No ratings yet
Learning Guide Unit 4 _ Home
14 pages
Non Numeric Clustering Seminar
No ratings yet
Non Numeric Clustering Seminar
26 pages
Queries As Vectors
No ratings yet
Queries As Vectors
3 pages
Information Retrieval Practical
No ratings yet
Information Retrieval Practical
10 pages
tkde-2014-26-7
No ratings yet
tkde-2014-26-7
17 pages
Vector Space Model
No ratings yet
Vector Space Model
10 pages
Assignment No 1 (Data Science) - Ashber
No ratings yet
Assignment No 1 (Data Science) - Ashber
9 pages
NLP - Experiment - 8 - A10
No ratings yet
NLP - Experiment - 8 - A10
16 pages
Document Ranking Using Customizes Vector Method
No ratings yet
Document Ranking Using Customizes Vector Method
6 pages
Precision recal TF idf
No ratings yet
Precision recal TF idf
36 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
No ratings yet
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
57 pages
Week 5 - Latent Semantic Indexing
No ratings yet
Week 5 - Latent Semantic Indexing
38 pages
A Complete Beginners Guide To Document Similarity Algorithms - by GreekDataGuy - Towards Data Science
No ratings yet
A Complete Beginners Guide To Document Similarity Algorithms - by GreekDataGuy - Towards Data Science
11 pages
Relevance of A Document To A Query
No ratings yet
Relevance of A Document To A Query
10 pages
Information Retrieval Notes
No ratings yet
Information Retrieval Notes
42 pages
RAGHack-AzureAISearch-Spanish
No ratings yet
RAGHack-AzureAISearch-Spanish
85 pages
Logical Methods
From Everand
Logical Methods
Greg Restall
No ratings yet
Higher Geometry: An Introduction to Advanced Methods in Analytic Geometry
From Everand
Higher Geometry: An Introduction to Advanced Methods in Analytic Geometry
Frederick S. Woods
No ratings yet
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
From Everand
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
Fouad Sabry
No ratings yet
Physical Layer
No ratings yet
Physical Layer
82 pages
Pizza Price Prediction 5thquestion
No ratings yet
Pizza Price Prediction 5thquestion
3 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Numpy Operations
No ratings yet
Numpy Operations
55 pages
IR - BTech Model Paper
100% (1)
IR - BTech Model Paper
2 pages
Mathematics Paper 1 TZ2 HL
No ratings yet
Mathematics Paper 1 TZ2 HL
5 pages
Daily Learning Map Mathematics 8 A.Y. 2024-2025: Practical Math 8 Practical Math 8
No ratings yet
Daily Learning Map Mathematics 8 A.Y. 2024-2025: Practical Math 8 Practical Math 8
4 pages
COURSE TEMPLATE For WEBSITE 1
No ratings yet
COURSE TEMPLATE For WEBSITE 1
85 pages
Mathematics Year Two Yearly Plan
No ratings yet
Mathematics Year Two Yearly Plan
6 pages
Isc Class 12 Comp Project
100% (2)
Isc Class 12 Comp Project
65 pages
Explore Learning Ellipse
No ratings yet
Explore Learning Ellipse
6 pages
Gauss Jordan Chemical Reactions
No ratings yet
Gauss Jordan Chemical Reactions
4 pages
Basic Opertaion Matrices
No ratings yet
Basic Opertaion Matrices
12 pages
Discrete Mathematics (550.171) Final Exam Practice Problems / Study Guide
No ratings yet
Discrete Mathematics (550.171) Final Exam Practice Problems / Study Guide
5 pages
1 Congruent Halves MD
No ratings yet
1 Congruent Halves MD
2 pages
A First Course in Chaotic Dynamical Systems: Theory and Experiment
No ratings yet
A First Course in Chaotic Dynamical Systems: Theory and Experiment
4 pages
ERUNT Ver5
No ratings yet
ERUNT Ver5
12 pages
X X X X X X X X: Xi Worksheet Inequalities, Modulus &complex Numbers
No ratings yet
X X X X X X X X: Xi Worksheet Inequalities, Modulus &complex Numbers
2 pages
Solutions2020
No ratings yet
Solutions2020
7 pages
DAA - Unit IV - Space and Time Tradeoffs - Lecture Slides
No ratings yet
DAA - Unit IV - Space and Time Tradeoffs - Lecture Slides
41 pages
FLP L2 PDF
No ratings yet
FLP L2 PDF
27 pages
cgt-53-70 (3)
No ratings yet
cgt-53-70 (3)
18 pages
Math Grade 6 Lesson Plans Term One
No ratings yet
Math Grade 6 Lesson Plans Term One
40 pages
Calculus Formula Sheet II
No ratings yet
Calculus Formula Sheet II
2 pages
Question Bank (Paper MATDSCT 6.1)
No ratings yet
Question Bank (Paper MATDSCT 6.1)
5 pages
Yearly Lesson Plan Add Math F5
No ratings yet
Yearly Lesson Plan Add Math F5
23 pages
SE-Comps SEM4 AOA-CBCGS DEC19 SOLUTION
No ratings yet
SE-Comps SEM4 AOA-CBCGS DEC19 SOLUTION
19 pages
Sorting and Search
No ratings yet
Sorting and Search
27 pages
8 Maths English
No ratings yet
8 Maths English
6 pages
Regression PPT Final
100% (1)
Regression PPT Final
59 pages
Basic Math Questions
No ratings yet
Basic Math Questions
3 pages
johnson1984-Papadimitriou and Steiglitz
No ratings yet
johnson1984-Papadimitriou and Steiglitz
4 pages
65 - C - 1 Mathematics
No ratings yet
65 - C - 1 Mathematics
23 pages
Winter Break Holiday Homework 8th
No ratings yet
Winter Break Holiday Homework 8th
2 pages