0% found this document useful (0 votes)
47 views9 pages

Data Mining: Sunitha R S Asst Prof Dept of ISE, RIT

The document discusses several objective measures used to determine interestingness in data mining: 1. Support and confidence - Measure the frequency of patterns based on a contingency table but have drawbacks when patterns are eliminated due to thresholds. 2. Interest factor - Overcomes limitations of support and confidence by comparing pattern frequencies to a statistical independence baseline. Known as lift. 3. Correlation analysis - Determines if variables are positively or negatively correlated based on interest factor values. 4. IS measure - Accounts for association between word pairs while considering the number of documents containing both words.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views9 pages

Data Mining: Sunitha R S Asst Prof Dept of ISE, RIT

The document discusses several objective measures used to determine interestingness in data mining: 1. Support and confidence - Measure the frequency of patterns based on a contingency table but have drawbacks when patterns are eliminated due to thresholds. 2. Interest factor - Overcomes limitations of support and confidence by comparing pattern frequencies to a statistical independence baseline. Known as lift. 3. Correlation analysis - Determines if variables are positively or negatively correlated based on interest factor values. 4. IS measure - Accounts for association between word pairs while considering the number of documents containing both words.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Mining

Sunitha R S
Asst Prof
Dept of ISE, RIT
Objective Measures of Interestingness
• Data Driven Approach
• Domain independent and requires minimal input from
the users.
• It is usually measured based on the frequency counts
using a contingency table.

B ~B f1+ Support of A
A f11 f10 f1+ f+1 Support of B
f0+ Support count not
~A f01 f00 f0+ having A

f+1 f+0 N
f+0 Support count not
having B
Objective Measures of Interestingness

• Four measures are studied here:


1. Support and Confidence
2. Interest Factor
3. Correlation Analysis
4. IS Measure
Support and Confidence
Support for the rule X-> Y is

Confidence for the rule X-> Y is


Drawback of Support and Confidence

• Drawback of support Interesting patterns


may get eliminated due to the value of
support threshold.

• Drawback of confidence is that it does not


take into account the support of the itemset
in the rule consequent.
Support-Confidence example
For the rule
Coffee Coffee Tea Coffee

Tea 150 50 200


Support = 15%
650 150 800 Confidence= 75%
Tea
800 200 1000
But probability that
people who drink only
coffee = 80% which is
misleading.
Interest Factor
• Used to overcome the drawback of Support
Confidence measure.
• Interest Factor compares the frequency of patterns
against a baseline under the statistical independence
assumption.
• It is also known as Lift

• For binary variables lift is called Interest Factor. It is


given by:
Interest Factor
• If A & B are statistically independent then, P(A,B)= p(A)*
p(B).
• I(A,B) = 1 if A & B are independent.
>1 if A & B are positively correlated.
<1 if A & B are negatively correlated.

Coffee I(A,B)
for the example
~Coffee
considered will be 0.9375
Tea 150 50 200
This shows a negative
~Tea 650 150 800
correlation among people who have coffee
and tea. 800 200 1000
Limitations of Interest Factor

• Considering association between a pair of words


depends on the number of documents that contain both
words.
• Two pairs of words are “Data Mining” and “Compiler
Mining”.
• I(p,q) = 1.0174 ͌ 1.02
• I(r,s)= 4.0816 ͌ 4.08

You might also like