0% found this document useful (0 votes)
177 views2 pages

Jacard Vs PMI

The Jaccard similarity coefficient and pointwise mutual information (PMI) measure are two different association measures used to compare variables. The Jaccard coefficient ranges from 0 to 1 and measures similarity based on the proportion of shared presences out of the total number of presences, without considering unequal frequencies. PMI is the log of the ratio of the joint probability of two variables occurring together to the product of their individual probabilities. Without taking the log, PMI can be viewed as a normalized version of the Ochiai coefficient, which similarly considers the product of probabilities but curbs similarity if frequencies are unequal. Both measures provide a way to quantify similarity or association between variables, but consider different factors in their calculations.

Uploaded by

AngelRibeiro10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views2 pages

Jacard Vs PMI

The Jaccard similarity coefficient and pointwise mutual information (PMI) measure are two different association measures used to compare variables. The Jaccard coefficient ranges from 0 to 1 and measures similarity based on the proportion of shared presences out of the total number of presences, without considering unequal frequencies. PMI is the log of the ratio of the joint probability of two variables occurring together to the product of their individual probabilities. Without taking the log, PMI can be viewed as a normalized version of the Ochiai coefficient, which similarly considers the product of probabilities but curbs similarity if frequencies are unequal. Both measures provide a way to quantify similarity or association between variables, but consider different factors in their calculations.

Uploaded by

AngelRibeiro10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

10/1/2017 probability - Jaccard similarity coecient vs.

Point-wise mutual information coecient - Cross Validated

join this community tour help

_
Cross Validated is a question and Here's how it works:
answer site for people interested in
statistics, machine learning, data
analysis, data mining, and data
visualization. Join them; it only takes a
minute:
Anybody can ask Anybody can The best answers are voted
a question answer up and rise to the top
Join

Jaccard similarity coefficient vs. Point-wise mutual information coefficient

Can you explain the difference between the Jaccard similarity coefficient and the pointwise mutual information (PMI) measure? It would
be great if you could add a few examples.

probability distance-functions mutual-information association-measure jaccard-similarity

edited Jan 17 at 14:00 asked Jan 17 at 12:11


ttnphns Moeen MH
31.6k 7 95 248 128 4

1 Answer

These two are quite different. Still, let us try to "bring them to a common denominator", to see
the difference. Both Jaccard and PMI could be extended to a continuous data case, but we'll
observe the primeval binary data case.

Using a,b,c,d convention of the 4-fold table, as here,

Y
1 0
-------
1 | a | b |
X -------
0 | c | d |
-------
a = number of cases on which both X and Y are 1
b = number of cases where X is 1 and Y is 0
c = number of cases where X is 0 and Y is 1
d = number of cases where X and Y are 0
a+b+c+d = n, the number of cases.

a
we know that Jaccard[X, Y ] =
a+b+c
.

P (X,Y )
PMI by Wikipedia definition is PMI[X, Y ] = log .
P (X)P (Y )

Let us first forget about "log" - because Jaccard implies no logarithming. Then plug a,b,c,d
notation into PMI formula to obtain:
a

P (X, Y ) a/n an (a+b)(a+c) Ochiai[X, Y ]


= = = =
a+b a+c
P (X)P (Y ) (a + b)(a + c) a+b a+c gm[P (X), P (Y )]
n n
n n

where "gm" is geometric mean of the two probabilities, and Ochiai similarity between X and Y

vectors is just another name for cosine similarity in case of binary data: a a

a+c
.
a+b

So, you can see that PMI (without logarithm) is Ochiai coefficient further "normalized" (or I'd
say, de-normalized) by the overall probability of the two-way positive (eventful) data.

But Jaccard and Ochiai are comparable. Both are association measures ranging from 0 to 1.
They differ in the accents they put on the potential discrepancy between frequencies b and c.
I've described it in the answer "Ochiai" above links to. To cite:

Because product (seen in Ochiai) increases weaker than sum (seen in Jaccard) when only
one of the terms grows, Ochiai will be really high only if both of the two proportions
(probabilities) are high, which implies that to be considered similar by Ochiai the two
vectors must share the great shares of their attributes/elements. In short, Ochiai curbs
similarity if b and c are unequal. Jaccard does not.

edited Apr 13 at 12:44 answered Jan 17 at 13:56


Community ttnphns
1 31.6k 7 95 248

https://stats.stackexchange.com/questions/256684/jaccard-similarity-coecient-vs-point-wise-mutual-information-coecient/25 1/2
10/1/2017 probability - Jaccard similarity coecient vs. Point-wise mutual information coecient - Cross Validated

https://stats.stackexchange.com/questions/256684/jaccard-similarity-coecient-vs-point-wise-mutual-information-coecient/25 2/2

You might also like