Cosine Similarity
Cosine Similarity
COSINE
SIMILARITY
Get updates in your inbox
Join over 7,500 data science learners.
SIMILARITY
Cosine Similarity
Contents Index +
Cosine similarity is a metric used to measure the similarity of two vectors. Specifically, it
measures the similarity in the direction or orientation of the vectors ignoring differences in
their magnitude or scale. Both vectors need to be part of the same inner product space,
meaning they must produce a scalar through inner product multiplication. The similarity of
two vectors is measured by the cosine of the angle between them.
We define cosine similarity mathematically as the dot product of the vectors divided by their
magnitude. For example, if we have two vectors, A and B, the similarity between them is
calculated as:
where
The similarity can take values between -1 and +1. Smaller angles between vectors produce
larger cosine values, indicating greater cosine similarity. For example:
• When two vectors have the same orientation, the angle between them is 0, and the
cosine similarity is 1.
• Perpendicular vectors have a 90-degree angle between them and a cosine similarity of
0.
• Opposite vectors have an angle of 180 degrees between them and a cosine similarity
of -1.
Here's a graphic showing two vectors with similarities close to 1, close to 0, and close to -1.
Cosine similarity is beneficial for applications that utilize sparse data, such as word
documents, transactions in market data, and recommendation systems because cosine
similarity ignores 0-0 matches. Counting 0-0 matches in sparse data would inflate similarity
scores. Another commonly used metric that ignores 0-0 matches is Jaccard Similarity.
Cosine Similarity is widely used in Data Science and Machine Learning applications.
Examples include measuring the similarity of:
Numerical Example
Suppose that our goal is to calculate the cosine similarity of the two documents given
below.
After creating a word table from the documents, the documents can be represented by the
following vectors:
D1 1 1 1 1 1 0 0
D2 0 0 1 1 0 1 1
•
•
Using these two vectors we can calculate cosine similarity. First, we calculate the dot
product of the vectors:
Finally, cosine similarity can be calculated by dividing the dot product by the magnitude
Python Example
Below, we defined a function that takes two vectors and returns cosine similarity. The
Python comments detail the same steps as in the numeric example above.
Learn Machine Learning by Doing Learn Now
import numpy as np
return cosine_similarity
As an example, Cosine similarity will be employed to find the similarity between the
following two documents:
print(X)
OU T:
[[0 0 0 1 1 1 1 1 2 1 2 0 1 0]
[0 1 1 1 0 0 1 0 1 1 1 0 1 1]
[1 0 0 2 0 0 0 0 0 0 0 1 0 0]]
With the above vectors, we can now compute cosine similarity between the corpus
documents:
OU T:
Alternatively, Cosine similarity can be calculated using functions defined in popular Python
libraries. Examples of such functions can be found in
sklearn.metrics.pairwise.cosine_similarity (docs) and in the SciPy library's cosine distance
OU T:
The results are same with the defined function. Notice that the input to sklearn 's function
is a matrix, and the output is also a matrix.
Fatih Karabiber
Ph.D. in Computer Engineering, Data Scientist
Associate Professor of Computer Engineering. Author/co-author of
over 30 journal publications. Instructor of
graduate/undergraduate courses. Supervisor of Graduate thesis.
Consultant to IT Companies.
Best Data Science Courses Best Machine Learning Courses Best Udemy Courses
Data Science & Machine Learning Glossary Free Data Science Books
Privacy Policy
Use of and/or registration on any portion of this site constitutes acceptance of our Privacy Policy. The material on
this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written
permission of LearnDataSci.com.
Learn Machine Learning by Doing Learn Now