0% found this document useful (0 votes)
144 views5 pages

Cosine Similarity

This document provides an overview of cosine similarity, a metric used to measure the similarity between two vectors. It defines cosine similarity mathematically, discusses how to calculate it, and provides examples of its applications in natural language processing, recommendation systems, and computer vision. The document also includes a numeric example and Python code demonstrating how to compute cosine similarity between text documents.

Uploaded by

Asmar Hajizada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views5 pages

Cosine Similarity

This document provides an overview of cosine similarity, a metric used to measure the similarity between two vectors. It defines cosine similarity mathematically, discusses how to calculate it, and provides examples of its applications in natural language processing, recommendation systems, and computer vision. The document also includes a numeric example and Python code demonstrating how to compute cosine similarity between text documents.

Uploaded by

Asmar Hajizada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

The internet's best courses on: Data Science Machine Learning Python

Daily Learning Courses Team


2.6K 512

You are reading Glossary

COSINE
SIMILARITY

Learn Machine Learning by Doing Learn Now


20
SHARES


Get updates in your inbox
Join over 7,500 data science learners.

 Enter your email

 Author: Fatih Karabiber


Ph.D. in Computer Engineering, Data Scientist Subscribe

 SIMILARITY


Cosine Similarity
Contents Index +

What is Cosine Similarity?

Cosine similarity is a metric used to measure the similarity of two vectors. Specifically, it
measures the similarity in the direction or orientation of the vectors ignoring differences in
their magnitude or scale. Both vectors need to be part of the same inner product space,
meaning they must produce a scalar through inner product multiplication. The similarity of
two vectors is measured by the cosine of the angle between them.

How to calculate Cosine Similarity

We define cosine similarity mathematically as the dot product of the vectors divided by their
magnitude. For example, if we have two vectors, A and B, the similarity between them is
calculated as:

where

• is the angle between the vectors,


• is dot product between A and B and calculated as
,
• represents the L2 norm or magnitude of the vector which is calculated as
.

The similarity can take values between -1 and +1. Smaller angles between vectors produce
larger cosine values, indicating greater cosine similarity. For example:

• When two vectors have the same orientation, the angle between them is 0, and the
cosine similarity is 1.
• Perpendicular vectors have a 90-degree angle between them and a cosine similarity of
0.
• Opposite vectors have an angle of 180 degrees between them and a cosine similarity
of -1.

Here's a graphic showing two vectors with similarities close to 1, close to 0, and close to -1.

Learn Machine Learning by Doing Learn Now


Get updates in your inbox
Join over 7,500 data science learners.
Applications

Cosine similarity is beneficial for applications that utilize sparse data, such as word
documents, transactions in market data, and recommendation systems because cosine
similarity ignores 0-0 matches. Counting 0-0 matches in sparse data would inflate similarity
scores. Another commonly used metric that ignores 0-0 matches is Jaccard Similarity.

Cosine Similarity is widely used in Data Science and Machine Learning applications.
Examples include measuring the similarity of:

• Documents in natural language processing


• Movies, books, videos, or users in recommendation systems
• Images in computer vision

Numerical Example

Suppose that our goal is to calculate the cosine similarity of the two documents given
below.

• Document 1 = 'the best data science course'


• Document 2 = 'data science is popular'

After creating a word table from the documents, the documents can be represented by the
following vectors:

the best data science course is popular

D1 1 1 1 1 1 0 0

D2 0 0 1 1 0 1 1


Using these two vectors we can calculate cosine similarity. First, we calculate the dot
product of the vectors:

Second, we calculate the magnitude of the vectors:

Finally, cosine similarity can be calculated by dividing the dot product by the magnitude

The angle between the vectors is calculated as:

Python Example

We will use NumPy to perform the cosine similarity calculations.

Below, we defined a function that takes two vectors and returns cosine similarity. The
Python comments detail the same steps as in the numeric example above.
Learn Machine Learning by Doing Learn Now
import numpy as np

def cosine_similarity(x, y):


Get updates in your inbox
Join over 7,500 data science learners.
# Ensure length of x and y are the same
if len(x) != len(y) :
return None

# Compute the dot product between x and y


dot_product = np.dot(x, y)

# Compute the L2 norms (magnitudes) of x and y


magnitude_x = np.sqrt(np.sum(x**2))
magnitude_y = np.sqrt(np.sum(y**2))

# Compute the cosine similarity


cosine_similarity = dot_product / (magnitude_x * magnitude_y)

return cosine_similarity

Learn Data Science with

As an example, Cosine similarity will be employed to find the similarity between the
following two documents:

corpus = [ 'data science is one of the most important fields of science',


'this is one of the best data science courses',
'data scientists analyze data' ]

Learn Data Science with

Using sklearn , we'll vectorize the documents:

from sklearn.feature_extraction.text import CountVectorizer

# Create a matrix to represent the corpus


X = CountVectorizer().fit_transform(corpus).toarray()

print(X)

OU T:

[[0 0 0 1 1 1 1 1 2 1 2 0 1 0]
[0 1 1 1 0 0 1 0 1 1 1 0 1 1]
[1 0 0 2 0 0 0 0 0 0 0 1 0 0]]

Learn Data Science with

With the above vectors, we can now compute cosine similarity between the corpus
documents:

cos_sim_1_2 = cosine_similarity(X[0, :], X[1, :])


cos_sim_1_3 = cosine_similarity(X[0, :], X[2, :])
cos_sim_2_3 = cosine_similarity(X[1, :], X[2, :])

print('Cosine Similarity between: ')


print('\tDocument 1 and Document 2: ', cos_sim_1_2)
print('\tDocument 1 and Document 3: ', cos_sim_1_3)
print('\tDocument 2 and Document 3: ', cos_sim_2_3)

OU T:

Cosine Similarity between:


Document 1 and Document 2: 0.6885303726590962
Document 1 and Document 3: 0.21081851067789195
Document 2 and Document 3: 0.2721655269759087

Learn Data Science with

Alternatively, Cosine similarity can be calculated using functions defined in popular Python
libraries. Examples of such functions can be found in
sklearn.metrics.pairwise.cosine_similarity (docs) and in the SciPy library's cosine distance

fuction. Learn Machine Learning by Doing Learn Now


Here's an example of using sklearn 's function:

Get updates in your inbox


from sklearn.metrics.pairwise import cosine_similarity Join over 7,500 data science learners.

Learn Data Science with

cos_sim_1_2 = cosine_similarity([X[0, :], X[1, :]])

print('Cosine Similarity between Document 1 and Document 2 is \n',cos_sim_1_2 )

OU T:

Cosine Similarity between Document 1 and Document 2 is


[[1. 0.68853037]
[0.68853037 1. ]]

Learn Data Science with

The results are same with the defined function. Notice that the input to sklearn 's function
is a matrix, and the output is also a matrix.

Get updates in your inbox


Join over 7,500 data science learners.

Enter your email Subscribe

Meet the Authors

Fatih Karabiber
Ph.D. in Computer Engineering, Data Scientist
Associate Professor of Computer Engineering. Author/co-author of
over 30 journal publications. Instructor of
graduate/undergraduate courses. Supervisor of Graduate thesis.
Consultant to IT Companies.

Editor: Rhys Editor: Brendan


Psychometrician Founder of LearnDataSci

Back to blog index

Best Data Science Courses Best Machine Learning Courses Best Udemy Courses

Data Science & Machine Learning Glossary Free Data Science Books

Privacy Policy

© 2023 LearnDataSci. All rights reserved.

Use of and/or registration on any portion of this site constitutes acceptance of our Privacy Policy. The material on
this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written
permission of LearnDataSci.com.
Learn Machine Learning by Doing Learn Now

You might also like