0% found this document useful (0 votes)

144 views5 pages

Cosine Similarity

This document provides an overview of cosine similarity, a metric used to measure the similarity between two vectors. It defines cosine similarity mathematically, discusses how to calculate it, and provides examples of its applications in natural language processing, recommendation systems, and computer vision. The document also includes a numeric example and Python code demonstrating how to compute cosine similarity between text documents.

Uploaded by

Asmar Hajizada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

144 views5 pages

Cosine Similarity

Uploaded by

Asmar Hajizada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

The internet's best courses on: Data Science Machine Learning Python

Daily Learning Courses Team

2.6K 512

You are reading Glossary

COSINE
SIMILARITY

Learn Machine Learning by Doing Learn Now

20
SHARES


Get updates in your inbox
Join over 7,500 data science learners.

 Enter your email

 Author: Fatih Karabiber

Ph.D. in Computer Engineering, Data Scientist Subscribe

 SIMILARITY


Cosine Similarity
Contents Index +

What is Cosine Similarity?

Cosine similarity is a metric used to measure the similarity of two vectors. Specifically, it
measures the similarity in the direction or orientation of the vectors ignoring differences in
their magnitude or scale. Both vectors need to be part of the same inner product space,
meaning they must produce a scalar through inner product multiplication. The similarity of
two vectors is measured by the cosine of the angle between them.

How to calculate Cosine Similarity

We define cosine similarity mathematically as the dot product of the vectors divided by their
magnitude. For example, if we have two vectors, A and B, the similarity between them is
calculated as:

where

• is the angle between the vectors,

• is dot product between A and B and calculated as
,
• represents the L2 norm or magnitude of the vector which is calculated as
.

The similarity can take values between -1 and +1. Smaller angles between vectors produce
larger cosine values, indicating greater cosine similarity. For example:

• When two vectors have the same orientation, the angle between them is 0, and the
cosine similarity is 1.
• Perpendicular vectors have a 90-degree angle between them and a cosine similarity of
0.
• Opposite vectors have an angle of 180 degrees between them and a cosine similarity
of -1.

Here's a graphic showing two vectors with similarities close to 1, close to 0, and close to -1.

Learn Machine Learning by Doing Learn Now

Get updates in your inbox
Join over 7,500 data science learners.
Applications

Cosine similarity is beneficial for applications that utilize sparse data, such as word
documents, transactions in market data, and recommendation systems because cosine
similarity ignores 0-0 matches. Counting 0-0 matches in sparse data would inflate similarity
scores. Another commonly used metric that ignores 0-0 matches is Jaccard Similarity.

Cosine Similarity is widely used in Data Science and Machine Learning applications.
Examples include measuring the similarity of:

• Documents in natural language processing

• Movies, books, videos, or users in recommendation systems
• Images in computer vision

Numerical Example

Suppose that our goal is to calculate the cosine similarity of the two documents given
below.

• Document 1 = 'the best data science course'

• Document 2 = 'data science is popular'

After creating a word table from the documents, the documents can be represented by the
following vectors:

the best data science course is popular

D1 1 1 1 1 1 0 0

D2 0 0 1 1 0 1 1

•
•

Using these two vectors we can calculate cosine similarity. First, we calculate the dot
product of the vectors:

Second, we calculate the magnitude of the vectors:

Finally, cosine similarity can be calculated by dividing the dot product by the magnitude

The angle between the vectors is calculated as:

Python Example

We will use NumPy to perform the cosine similarity calculations.

Below, we defined a function that takes two vectors and returns cosine similarity. The
Python comments detail the same steps as in the numeric example above.
Learn Machine Learning by Doing Learn Now
import numpy as np

def cosine_similarity(x, y):

Get updates in your inbox
Join over 7,500 data science learners.
# Ensure length of x and y are the same
if len(x) != len(y) :
return None

# Compute the dot product between x and y

dot_product = np.dot(x, y)

# Compute the L2 norms (magnitudes) of x and y

magnitude_x = np.sqrt(np.sum(x**2))
magnitude_y = np.sqrt(np.sum(y**2))

# Compute the cosine similarity

cosine_similarity = dot_product / (magnitude_x * magnitude_y)

return cosine_similarity

Learn Data Science with

As an example, Cosine similarity will be employed to find the similarity between the
following two documents:

corpus = [ 'data science is one of the most important fields of science',

'this is one of the best data science courses',
'data scientists analyze data' ]

Learn Data Science with

Using sklearn , we'll vectorize the documents:

from sklearn.feature_extraction.text import CountVectorizer

# Create a matrix to represent the corpus

X = CountVectorizer().fit_transform(corpus).toarray()

print(X)

OU T:

[[0 0 0 1 1 1 1 1 2 1 2 0 1 0]
[0 1 1 1 0 0 1 0 1 1 1 0 1 1]
[1 0 0 2 0 0 0 0 0 0 0 1 0 0]]

Learn Data Science with

With the above vectors, we can now compute cosine similarity between the corpus
documents:

cos_sim_1_2 = cosine_similarity(X[0, :], X[1, :])

cos_sim_1_3 = cosine_similarity(X[0, :], X[2, :])
cos_sim_2_3 = cosine_similarity(X[1, :], X[2, :])

print('Cosine Similarity between: ')

print('\tDocument 1 and Document 2: ', cos_sim_1_2)
print('\tDocument 1 and Document 3: ', cos_sim_1_3)
print('\tDocument 2 and Document 3: ', cos_sim_2_3)

OU T:

Cosine Similarity between:

Document 1 and Document 2: 0.6885303726590962
Document 1 and Document 3: 0.21081851067789195
Document 2 and Document 3: 0.2721655269759087

Learn Data Science with

Alternatively, Cosine similarity can be calculated using functions defined in popular Python
libraries. Examples of such functions can be found in
sklearn.metrics.pairwise.cosine_similarity (docs) and in the SciPy library's cosine distance

fuction. Learn Machine Learning by Doing Learn Now

Here's an example of using sklearn 's function:

Get updates in your inbox

from sklearn.metrics.pairwise import cosine_similarity Join over 7,500 data science learners.

Learn Data Science with

cos_sim_1_2 = cosine_similarity([X[0, :], X[1, :]])

print('Cosine Similarity between Document 1 and Document 2 is \n',cos_sim_1_2 )

OU T:

Cosine Similarity between Document 1 and Document 2 is

[[1. 0.68853037]
[0.68853037 1. ]]

Learn Data Science with

The results are same with the defined function. Notice that the input to sklearn 's function
is a matrix, and the output is also a matrix.

Get updates in your inbox

Join over 7,500 data science learners.

Enter your email Subscribe

Meet the Authors

Fatih Karabiber
Ph.D. in Computer Engineering, Data Scientist
Associate Professor of Computer Engineering. Author/co-author of
over 30 journal publications. Instructor of
graduate/undergraduate courses. Supervisor of Graduate thesis.
Consultant to IT Companies.

Editor: Rhys Editor: Brendan

Psychometrician Founder of LearnDataSci

Back to blog index

Best Data Science Courses Best Machine Learning Courses Best Udemy Courses

Data Science & Machine Learning Glossary Free Data Science Books

Use of and/or registration on any portion of this site constitutes acceptance of our Privacy Policy. The material on
this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written
permission of LearnDataSci.com.
Learn Machine Learning by Doing Learn Now

M.J.D. Powell - Approximation Theory and Methods-Cambridge University Press (1981)
No ratings yet
M.J.D. Powell - Approximation Theory and Methods-Cambridge University Press (1981)
351 pages
FA Slide
No ratings yet
FA Slide
310 pages
Pseudo Differential Operators and Symmetries Background Analysis and Advanced Topics 1st Edition Michael Ruzhansky Download
100% (2)
Pseudo Differential Operators and Symmetries Background Analysis and Advanced Topics 1st Edition Michael Ruzhansky Download
74 pages
Chapter 8 - Collaborative - Filtering
No ratings yet
Chapter 8 - Collaborative - Filtering
118 pages
Mathcad Functions PDF
No ratings yet
Mathcad Functions PDF
33 pages
CS 3308 Learning Journal Unit 4
No ratings yet
CS 3308 Learning Journal Unit 4
5 pages
CS822 DataMining Week3
No ratings yet
CS822 DataMining Week3
91 pages
Tikzmark
No ratings yet
Tikzmark
57 pages
DSB - Unit3
No ratings yet
DSB - Unit3
87 pages
Lecture 3
No ratings yet
Lecture 3
58 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Lec-3. Datamining-Similarity-Distance-Ext
No ratings yet
Lec-3. Datamining-Similarity-Distance-Ext
104 pages
L04
No ratings yet
L04
35 pages
3 Unit PR NonParametric Decision Making
No ratings yet
3 Unit PR NonParametric Decision Making
78 pages
PythonAI VectorEmbeddingsForSharing
No ratings yet
PythonAI VectorEmbeddingsForSharing
46 pages
Webir 06
No ratings yet
Webir 06
32 pages
CS 3308 Learning Journal 4
No ratings yet
CS 3308 Learning Journal 4
3 pages
Chapter 4 - Part II
No ratings yet
Chapter 4 - Part II
44 pages
ISR Chap... 5
No ratings yet
ISR Chap... 5
34 pages
Lecture 11 Collaborative Filtering
No ratings yet
Lecture 11 Collaborative Filtering
37 pages
Module-3Conti.. Similarity& Dissimlarity
No ratings yet
Module-3Conti.. Similarity& Dissimlarity
29 pages
Distance and Similarity
No ratings yet
Distance and Similarity
33 pages
Similarity
No ratings yet
Similarity
20 pages
Text Similarity Metrics
No ratings yet
Text Similarity Metrics
10 pages
DS5 Statistics
No ratings yet
DS5 Statistics
67 pages
Computer Science For Digital Engineering Assignment Report
No ratings yet
Computer Science For Digital Engineering Assignment Report
15 pages
Cosine Similarity in Machine Learning
No ratings yet
Cosine Similarity in Machine Learning
14 pages
Unit 1
No ratings yet
Unit 1
21 pages
9240BSC Maths V-Semester Year Wise (2019-2024)
No ratings yet
9240BSC Maths V-Semester Year Wise (2019-2024)
15 pages
Non Numeric Clustering Seminar
No ratings yet
Non Numeric Clustering Seminar
26 pages
Tkde 2014 26 7
No ratings yet
Tkde 2014 26 7
17 pages
Data Mining and Predictive Modeling: Lecture 13: Measuring Data Similarity
No ratings yet
Data Mining and Predictive Modeling: Lecture 13: Measuring Data Similarity
19 pages
L14 VSM
No ratings yet
L14 VSM
24 pages
Pratical Work
No ratings yet
Pratical Work
11 pages
NLP - Experiment - 8 - A10
No ratings yet
NLP - Experiment - 8 - A10
16 pages
Cosine Similarity
No ratings yet
Cosine Similarity
4 pages
Clustering
No ratings yet
Clustering
43 pages
VectorApplicationsInDS
No ratings yet
VectorApplicationsInDS
31 pages
Euclidean Space
No ratings yet
Euclidean Space
3 pages
1.5 Vectors and Vector Algebra
No ratings yet
1.5 Vectors and Vector Algebra
8 pages
03 Schubert
No ratings yet
03 Schubert
13 pages
An Introduction To Functional Analysis For Science and Engineering
No ratings yet
An Introduction To Functional Analysis For Science and Engineering
60 pages
Folland Ch6 Sol
No ratings yet
Folland Ch6 Sol
11 pages
Dip Unit-I
No ratings yet
Dip Unit-I
14 pages
CS2209 Similarity Distances
No ratings yet
CS2209 Similarity Distances
23 pages
NUM701S Lecture Notes Book
No ratings yet
NUM701S Lecture Notes Book
58 pages
Lecture 1 On Solid Mechanics
No ratings yet
Lecture 1 On Solid Mechanics
89 pages
A Complete Beginners Guide To Document Similarity Algorithms - by GreekDataGuy - Towards Data Science
No ratings yet
A Complete Beginners Guide To Document Similarity Algorithms - by GreekDataGuy - Towards Data Science
11 pages
Is Cosine-Similarity of Embeddings Really About Similarity
No ratings yet
Is Cosine-Similarity of Embeddings Really About Similarity
9 pages
Alshammari 2023 Ijca 922667
No ratings yet
Alshammari 2023 Ijca 922667
4 pages
Similarity and Dissimilarity
No ratings yet
Similarity and Dissimilarity
34 pages
Cosine Similarity
No ratings yet
Cosine Similarity
3 pages
Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y
No ratings yet
Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y
79 pages
Lec 5
No ratings yet
Lec 5
22 pages
Physics-Experiment 1-Lab-Report
No ratings yet
Physics-Experiment 1-Lab-Report
8 pages
Pract 1 Measuring The Document Similarity in Python
No ratings yet
Pract 1 Measuring The Document Similarity in Python
6 pages
Assignment No 1 (Data Science) - Ashber
No ratings yet
Assignment No 1 (Data Science) - Ashber
9 pages
2-03-Hilbert Space
No ratings yet
2-03-Hilbert Space
16 pages
Vector Space Model
No ratings yet
Vector Space Model
4 pages
Lab 2
No ratings yet
Lab 2
21 pages
Final MA 240 Lab Manual
No ratings yet
Final MA 240 Lab Manual
70 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
52 pages
Similarity
No ratings yet
Similarity
20 pages
Similarity
No ratings yet
Similarity
19 pages
Stabilizing Controller Design For Uncertain Nonlinear Systems Using Fuzzy Models
No ratings yet
Stabilizing Controller Design For Uncertain Nonlinear Systems Using Fuzzy Models
10 pages
Materi 7.1. Distance Measurement
No ratings yet
Materi 7.1. Distance Measurement
14 pages
Lectures in Functional Analysis-Roman Vershynin PDF
No ratings yet
Lectures in Functional Analysis-Roman Vershynin PDF
131 pages
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
No ratings yet
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
57 pages
Data Mining: Similarity and Distance
No ratings yet
Data Mining: Similarity and Distance
13 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
Lesson 6 Similarities KNN
No ratings yet
Lesson 6 Similarities KNN
25 pages
Math 4310 Handout - Quotient Vector Spaces: Dan Collins
No ratings yet
Math 4310 Handout - Quotient Vector Spaces: Dan Collins
5 pages
Data Mining: Similarity and Distance
No ratings yet
Data Mining: Similarity and Distance
13 pages
LAFF Week1 Release1
No ratings yet
LAFF Week1 Release1
48 pages
Functional Analysis Week03 PDF
No ratings yet
Functional Analysis Week03 PDF
16 pages
Convergence FEM
No ratings yet
Convergence FEM
11 pages
2013 COMP5318 Lecture1
No ratings yet
2013 COMP5318 Lecture1
21 pages
Absolute Value - Wikipedia
No ratings yet
Absolute Value - Wikipedia
10 pages
Documents Similarity
No ratings yet
Documents Similarity
6 pages
Cosine Similarity Tutorial
No ratings yet
Cosine Similarity Tutorial
7 pages
What Is Cosine Similarity and Why Is It Advantageous?
No ratings yet
What Is Cosine Similarity and Why Is It Advantageous?
2 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
Rudin Walter Functional Analysis
100% (3)
Rudin Walter Functional Analysis
407 pages
Chapter 01 - Introduction To Vector Analysis
No ratings yet
Chapter 01 - Introduction To Vector Analysis
6 pages
Rhotrix Topological Space
No ratings yet
Rhotrix Topological Space
8 pages
Partial Euclidean Distance
No ratings yet
Partial Euclidean Distance
4 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Cosine Similarity

Uploaded by

Cosine Similarity

Uploaded by

The internet's best courses on: Data Science Machine Learning Python

Daily Learning Courses Team

You are reading Glossary

Learn Machine Learning by Doing Learn Now

 Enter your email

 Author: Fatih Karabiber

What is Cosine Similarity?

How to calculate Cosine Similarity

• is the angle between the vectors,

Learn Machine Learning by Doing Learn Now

• Documents in natural language processing

• Document 1 = 'the best data science course'

the best data science course is popular

Second, we calculate the magnitude of the vectors:

The angle between the vectors is calculated as:

We will use NumPy to perform the cosine similarity calculations.

def cosine_similarity(x, y):

# Compute the dot product between x and y

# Compute the L2 norms (magnitudes) of x and y

# Compute the cosine similarity

Learn Data Science with

corpus = [ 'data science is one of the most important fields of science',

Learn Data Science with

Using sklearn , we'll vectorize the documents:

from sklearn.feature_extraction.text import CountVectorizer

# Create a matrix to represent the corpus

Learn Data Science with

cos_sim_1_2 = cosine_similarity(X[0, :], X[1, :])

print('Cosine Similarity between: ')

Cosine Similarity between:

Learn Data Science with

fuction. Learn Machine Learning by Doing Learn Now

Get updates in your inbox

Learn Data Science with

cos_sim_1_2 = cosine_similarity([X[0, :], X[1, :]])

print('Cosine Similarity between Document 1 and Document 2 is \n',cos_sim_1_2 )

Cosine Similarity between Document 1 and Document 2 is

Learn Data Science with

Get updates in your inbox

Enter your email Subscribe

Meet the Authors

Editor: Rhys Editor: Brendan

Back to blog index

© 2023 LearnDataSci. All rights reserved.

You might also like