0% found this document useful (0 votes)
40 views59 pages

Module-IV (Frequent Pattern & Association Rule Mining)

Uploaded by

20051694
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views59 pages

Module-IV (Frequent Pattern & Association Rule Mining)

Uploaded by

20051694
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Frequent Item-sets

&
Association Rules
Frequent Item Set & Association Rules Mining
Association rule mining
Given a set of records each of which contain some number of items from a given collection of
transactions. Aim is to produce the dependency rules which will predict occurrence of an item
based on the occurrences of other items.
Transac ID Items bought Example of Association Rules
tion
T1 10 Beer, Nuts, Diaper
{Diaper} → {Beer},
{Milk, Bread} →{Eggs,Coke},
T2 20 Beer, Coke, Diaper
{Beer, Bread} → {Milk}
T3 30 Bread, Diaper, Eggs
T4 40 Nuts, Eggs, Milk
Implication (“->”) means co-occurrence,
T5 50 Nuts, Coffee, Diaper, Eggs, Milk
not causality!
The customers who purchase Bread have a chance to purchase Milk
The retail organizations are trying to find the assocation between products
which can be sold togother
Note: Assume all data are categorical. However, no good algorithm for numeric data.
Example of Association rule

Then the rule discovered be :-


“IF” part = antecedent bread and butter → Milk [with confidence level 90%]
“THEN” part = consequent (If: antecedent) (then: consequent)
Note: Antecedent and consequent are disjoint (i.e., have no items in common)

Can be used to determine what should be done to boost its sales.

Can be used to see which products would be affected if


the store discontinues selling bread and butter.

Can be used to see what


products should be sold with bread and butter to promote sale of Milk!
Transaction data: Supermarket data
Concepts:
• I: the set of all items sold in the store. {Bread, Milk,...}
• A transaction: items purchased in a basket; it may
have TID (transaction ID)
• A transactional dataset: A set of transactions
• Itemset: A collection of one or more items
{Milk, Bread, Diaper}
k-itemset: An itemset that contains k items
Class Exercise Transaction data: Supermarket data

Determine the frequent Itemsets.


Market-Basket Model
What is Market Basket Analysis?
 It is a technique which identifies the strength of association between pairs of products purchased
together and identify patterns of co-occurrence. A co-occurrence is when two or more things take place
together.
 It creates If-Then scenario rules, for example, if item A is purchased then item B is likely to be
purchased. The rules are probabilistic in nature or, in other words, they are derived from the
frequencies of co-occurrence in the observations. Frequency is the proportion of baskets that contain the
items of interest. The rules can be used in pricing strategies, product placement, inventory
management, content placement in e-commerce sites, advertising strategy, store layout, and
various types of cross-selling strategies.
 Market Basket Analysis takes data at transaction level, which lists all items bought by a customer in a
single purchase. The technique determines relationships of what products were purchased with which
other product(s). These relationships are then used to build profiles containing If-Then rules of the
items purchased.
 The rules could be written as: If {A} Then {B}. The If part of the rule (the {A}) is known as the
antecedent and the THEN part of the rule is known as the consequent (the {B}). The antecedent is the
condition and the consequent is the result.
7
What is Market Basket Analysis
 For example, you are in a supermarket to buy milk. Referring to the below example, there are nine baskets
containing varying combinations of milk, cheese, apples, and bananas.
 Question - are you more likely to buy apples or cheese in the same transaction than somebody who did not buy
milk?

 The next step is to determine the relationships and the rules. So, association rule mining is applied in this context.
It is a procedure which aims to observe frequently occurring patterns, correlations, or associations from datasets
found in various kinds of databases such as relational databases, transactional databases, and other forms of
repositories.
8
Market-Basket Analysis
• Market Basket Analysis (Association Analysis) is a mathematical modeling
technique based upon the theory that if you buy a certain group of items, you
are likely to buy another group of items.

• It is used to analyze the customer purchasing behavior and helps in


increasing the sales and maintaining inventory by focusing on the point
of sale transaction data.

9
Market-Basket Analysis
• Consider shopping cart filled with several items
• Market basket analysis tries to answer the following questions:
• Who makes purchases?
• What items tend to be purchased together
• obvious: steak-potatoes; beer-pretzels
• What items are purchased sequentially
• obvious: house-furniture; car-tires
• In what order do customers purchase items?
• What items tend to be purchased by season
• It is also about what customers do not purchase, and why.
• If customers purchase baking powder, but no flour, what are they baking?
• If customers purchase a mobile phone, but no case, are you missing an
opportunity?
10
Market-Basket Analysis

• Categorize customer purchase behavior


• Identify actionable information
• purchase profiles
• profitability of each purchase profile
• use for marketing
• layout or catalogs
• select products for promotion
• space allocation, product placement

11
Market-Basket Analysis
TID CID Date Item Qty
111 201 5/1/99 Pen 2
A database of customer transactions 111 201 5/1/99 Ink 1
111 201 5/1/99 Milk 3
• Each transaction is a set of items 111 201 5/1/99 Juice 6
• Example: 112 105 6/3/99 Pen 1
Transaction with TID 111 contains 112 105 6/3/99 Ink 1
112 105 6/3/99 Milk 1
items 113 106 6/5/99 Pen 1
{Pen, Ink, Milk, Juice} 113 106 6/5/99 Milk 1
114 201 7/1/99 Pen 2
114 201 7/1/99 Ink 2
114 201 7/1/99 Juice 4

12
Market-Basket Analysis
Co-occurrences
• 80% of all customers purchase items X, Y and Z together.
Association rules
• 60% of all customers who purchase X and Y also buy Z.
Sequential patterns
• 60% of customers who first buy X also purchase Y within
three weeks.

13
Association rule mining
• Proposed by Agrawal et al in 1993.
• It is an important data mining model studied extensively by the database
and data mining community.
• Assume all data are categorical.
• No good algorithm for numeric data.
• Initially used for Market Basket Analysis to find how items purchased by
customers are related.

Bread  Milk [sup = 5%, conf = 100%]


14
Association Rule Mining Task
The association rule has three measures that
express the degree of confidence in the rule, i.e.
Support, Confidence, and Lift

 The number of transactions that include both items {A} and {B} as a percentage
of the total number of transactions. It is a measure of how frequently the collection of items
(here {A} & {B}) occur together as a percentage of all transactions.
 probability that a transaction contains (A Ս B)
Example: Support(milk) = 6/9, Support(cheese) = 7/9, Support(milk & cheese) = 6/9.
Association Rule Mining Task
 It is the ratio of the no of transactions that includes {A} and {B} to the no of
transactions that includes all items in {A}.
 Confidence is the conditional probability that a transaction having A also contains B:- Pr(B|A)
Example: , Confidence(milk => cheese) = (milk & cheese)/(milk) = 6/ 6.

The lift of the rule A=>B is the confidence of the rule divided by the expected confidence,
assuming that the itemsets A and B are independent of each other.

Example: Lift(milk => cheese) = [(milk & cheese)/(milk) ]/[cheese/Total] = [6/6] / [7/9] =
1/0.777.
Class Exercise Transaction data: Supermarket data

Basket Item(s) Support Confidence Lift


Milk ?
Cheese ?
1 Milk => Cheese ? ? ?
2 Apple, Milk ?
3 (Apple, Milk) => Cheese ? ? ?
4 Apple, Cheese ?
5 (Apple, Cheese) => Milk ? ? ?
Association Rule Mining Task

The goal of association rule mining is to find all rules having


 support ≥ minsup threshold
 confidence ≥ minconf threshold

• List all possible association rules


• Compute the support and confidence for each rule
• Prune rules that fail the minsup and minconf thresholds
[Computationally prohibitive!]
The model: Association rules
• A transaction t contains X, a set of items (itemset) in I,
if X  t.
• An association rule is an implication of the form:
X  Y, where X, Y  I, and X Y = 
• An item-set is a set of items.
• E.g., X = {milk, bread, cereal} is an item-set.
• A k-item-set is an item-set with k items.
• E.g., {milk, bread, cereal} is a 3-itemset

19
Association Rule Mining Task
Association Rule Mining is a Two-step approach:

• Generate all itemsets whose support >= minsup

• This property belongs to a special category of properties called


anti-monotonicity in the sense that if a set cannot pass a test, all of its
supersets will fail the same test as well.

• Generate high confidence rules from each frequent itemset, where


• each rule is a binary partitioning of a frequent item-set
Association Rule Mining Task [Brute-force Approach]

Given d items, there are 2d -1


possible candidate item-sets
Association Rule Mining Algorithim: Apriori Algorithm
 The Apriori algorithm uses frequent itemsets to generate association rules and with the help of
these association rules it determines how strongly or how weakly two objects are connected.
 It is the iterative process for finding the frequent itemsets from the large dataset.
 It is designed to work on the databases that contain transactions. It is mainly used for market
basket analysis and helps to find those products that can be bought together. It can also be used
in the healthcare field to find drug reactions for patients.

Apriori says the probability that item I is not frequent if:

 P(I) < minimum support threshold, then I is not frequent.


 And If an itemset set I has value less than minimum support then all of its
supersets (I+A) will also fall below minimum support, and thus can be ignored.
This property is called the Anti-monotone property.
Apriori Algorithm
• The steps followed to genarate frequent item set in the Apriori algorithm are:

 Join Step: This step generates (k+1) candidate itemset from k-itemsets by joining each item
with itself.
[The algorithm uses a level-wise search, where k-itemsets are used to explore (k+1)-itemsets.
The frequent subsets are extended one item at a time (this step is known as candidate generation
process) ]
 Prune Step: This step scans the support count of each item in the database. If the candidate
item does not meet minimum support, then it is regarded as infrequent and thus is removed.
This step is performed to reduce the size of the candidate itemsets.
[From Candidate list of k-itemsets it extracts a Frequent list of k-itemsets using the support count]

 Find the association rules from the final subsets or frequent itemset by calculating the
confidence value. The association rule which does not meet minimum confidence are
ignored.
Apriori Algorithm Problem

For the following given dataset generate rules using Apriori Algorithms. Consider support
value=2 (or 22%) and confidence = 50%
Apriori Algorithm Problem
Step-1:
Iteratio:1
Find from C1 that
Find that contains Support contains item set of length-1 those meet the
count of each item set of length-1 minimum support count.

C1 L1
Itemset Support Count Itemset Support Count
A 6 A 6
B 7 B 7
C 6
C 6
D 2
D 2
E 1
Step-1: Apriori Algorithm Problem contd...
Iteratio:2
Find with the help of L1 that Find from C2 that
contains item set of length-2 those meet the
contains Support count of each item set of
minimum support count.
length-2
C2 L2
Itemset Support Count Itemset Support Count
{A, B} 4
{A, B} 4
{A,C} 4
{A, D} 1 {A, C} 4
{B, C} 4 {B, C} 4
{B, D} 2
{B, D} 2
{C, D} 0
Apriori Algorithm Problem contd...
Step-1:
Iteratio:3 Find from C3 that
Find with the help of L2 that contains item set of length-3 those meet the
contains Support count of each item set of minimum support count.
length-3
C3 L3

Itemset Support Count Itemset Support Count


{A, B, C} 2
{A, B, C} 2
{B, C, D} 0 Now, as L3 is having only one combination,
{A, C, D} 0 i.e., {A, B, C} no further joining can be
possible.
{A, B, D} 0
Apriori Algorithm Problem contd...
Step-2: Association Rule Generation
Create a new table Association Rule (AR) with all the possible rules from the occurred
combinations of frequent item sets. For all the rules, calculate the Confidence (A=>B) using
formula sup(A^B)/A. After calculating the confidence value for all rules, exclude the rules
that have less confidence than the minimum threshold (50%).
Rules Confidence
A^B → C sup{(A^B)^C}/sup(A^B)= 2/4 = 0.5 =50%
B^C → A sup{(B^C) ^A}/sup(B^C)= 2/4=0.5=50%
A^C → B sup{(A ^C)^B}/sup(A^C)= 2/4=0.5=50%
C → A^B sup{(C^( A^B)}/sup(C)= 2/6=0.4=40%
A → B^C sup{(A^( B^C)}/sup(A)= 2/6=0.33=33.33%
B → B^C sup{(B^( B^C)}/sup(B)= 2/7=0.28=28%

As the given minimum threshold confidence is 50%, so the first three rules A^B → C, B^C → A,
and A^C → B can be considered as the strong association rules for the given problem.
APRIORI ALGORITHM EXAMPLE

29
Class Excercise

30
Class Exercise Solution

31
Generating Association Rules
From frequent item-sets
• Procedure 1:
• Let we have the list of frequent item-sets

• Generate all nonempty subsets for each frequent item-set I


• For I = {1,3,5}, all nonempty subsets are {1,3},{1,5},{3,5},{1},{3},{5}

• For I = {2,3,5}, all nonempty subsets are {2,3},{2,5},{3,5},{2},{3},{5}

32
Generating Association Rules
From frequent item-sets
• Procedure 2:
• For every nonempty subset S of I, output the rule:
S → (I - S)
• If support_count(I)/support_count(s)>= min_conf
where min_conf is minimum confidence threshold

• Let us assume:
• minimum confidence threshold is 60%

33
Association Rules with confidence
• R1 : 1,3 -> 5
– Confidence = sc{1,3,5}/sc{1,3} = 2/3 = 66.66% (R1 is selected)
• R2 : 1,5 -> 3
– Confidence = sc{1,5,3}/sc{1,5} = 2/2 = 100% (R2 is selected)
• R3 : 3,5 -> 1
– Confidence = sc{3,5,1}/sc{3,5} = 2/3 = 66.66% (R3 is selected)
• R4 : 1 -> 3,5
– Confidence = sc{1,3,5}/sc{1} = 2/3 = 66.66% (R4 is selected)
• R5 : 3 -> 1,5
– Confidence = sc{3,1,5}/sc{3} = 2/4 = 50% (R5 is REJECTED)
• R6 : 5 -> 1,3
– Confidence = sc{5,1,3}/sc{5} = 2/4 = 50% (R6 is REJECTED)
34
How to efficiently generate rules?
• In general, confidence does not have an anti-monotone property
c(ABC→D) can be larger or smaller than c(AB →D)

• But confidence of rules generated from the same item-set has an anti-
monotone property

• e.g., L= {A,B,C,D}
c(ABC→D) ≥ c(AB→CD) ≥ c(A→BCD)
Confidence is anti-monotone w.r.t. number of items on the RHS of the rule.

35
Rule generation for Apriori Algorithm

36
Rule generation for Apriori Algorithm

Pruned the
Rule
37
Apriori Algorithm Flow
Apriori Algorithm Pseudo Code
Apriori Algorithm
Class Exercise
Q1: Find frequent temsets and generate association rules for them. Illustrate it with step-by-step process.
Transaction List of items Minimum support = 2
T1 I1, I2, I3 Minimum confidence = 50%
T2 I2, I3, I4
T3 I4, I5
T4 I1, I2, I4
T5 I1, I2, I3, I5
T6 I1, I2, I3, I4
Q2: Choose minimum support and minimum confidence to your choice and Find frequent itemsets.
Advantages
Uses large itemset property
Easily parallelized
Easy to implement
Disadvantages
Assumes transaction database is memory resident.
Requires many database scans.
Frequent Itemset, Closed Itemset and Maximal Itemset

TID Itemset
1 {A, C, D}
2 {B, C, E}
3 {A, B, C, E}
4 {B, E}
5 {A, B, C, E}
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support FR/In-Fr Itemset Support FR/In-Fr
Frequent Itemset:
A 3/5 Freq ABD 0/5 In-fre
For the following Transaction Database B 4/5 Freq ABE 2/5 Freq
for minimum support 2 the frequent C 4/5 ACD 1/5
Freq In-fre
itemset are: D 1/5 ACE 2/5
In-fre Freq
Total Itemsets 25-1=31 E 4/5
Freq
ADE 0/5 In-fre
Frequent Itemets = 15 AB 2/5 BCD 0/5 In-fre
Freq
Infrequent Itemsets = 16 AC 3/5 BCE 3/5 Freq
Freq
TID Itemset AD 1/5 BDE 0/5 In-fre
In-fre
AE 2/5 CDE 0/5 In-fre
1 {A, C, D} Freq
BC 3/5 ABCD 0/5 In-fre
2 {B, C, E} Freq
BD 0/5 ABCE 2/5 Freq
BE 4/5 In-fre ABDE 0/5 In-fre
3 {A, B, C, E}
CD 1/5 Freq ACDE 0/5 In-fre
4 {B, E}
CE 3/5 In-fre BCDE 0/5 In-fre
5 {A, B, C, E} DE 0/5 Freq ABCDE 0/5 In-fre
Frequent Itemset, Closed Itemset and Maximal Itemset
Frequent Iteset: Itemset
Itemset Support
Support FR/In-Fr
FR/In-Fr Itemset
Itemset Support
Support FR/In-Fr
FR/In-Fr
AA 3/5
3/5 Freq
Freq ABD
ABD 0/5
0/5 In-fre
In-fre
TID Itemset BB 4/5
4/5 ABE
ABE 2/5
2/5 Freq
Freq
Freq
Freq
1 {A, C, D} CC 4/5
4/5 Freq
Freq ACD
ACD 1/5
1/5 In-fre
In-fre
D
D 1/5
1/5 In-fre
In-fre ACE
ACE 2/5
2/5 Freq
Freq
2 {B, C, E}
EE 4/5
4/5 Freq
Freq ADE
ADE 0/5
0/5 In-fre
In-fre
3 {A, B, C, E} AB
AB 2/5
2/5 BCD 0/5 In-fre
Freq
Freq BCD 0/5 In-fre
4 {B, E} AC
AC 3/5
3/5 Freq
Freq BCE
BCE 3/5
3/5 Freq
Freq
AD
AD 1/5
1/5 In-fre BDE
BDE 0/5
0/5 In-fre
In-fre
5 {A, B, C, E} In-fre
AE
AE 2/5
2/5 Freq
Freq CDE
CDE 0/5
0/5 In-fre
In-fre
BC
BC 3/5
3/5 Freq
Freq ABCD
ABCD 0/5
0/5 In-fre
In-fre
Apriori Principle:
BD
BD 0/5
0/5 In-fre
In-fre ABCE
ABCE 2/5
2/5 Fre
Freq
• If an Itemset is Infrequent hen BE
BE 4/5
4/5 Freq
Freq ABDE
ABDE 0/5
0/5 In-Freq
In-fre
all its Supersets are Infrequent. CD
CD 1/5
1/5 In-fre
In-fre ACDE
ACDE 0/5
0/5 In-fre
In-fre
• If an Itemset is frequent then CE
CE 3/5
3/5 Freq
Freq BCDE
BCDE 0/5
0/5 In-fre
In-fre
all its Subsets are Frequent DE
DE 0/5
0/5 In-Freq
In-Freq ABCDE
ABCDE 0/5
0/5 In-fre
In-fre
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support
Due to Apriori Principle, Frequent Itemset: ==> A 3/5
B 4/5

TID Itemset C 4/5


E 4/5
1 {A, C, D}
AB 2/5
2 {B, C, E} AC 3/5
3 {A, B, C, E} AE 3/5
4 {B, E} BC 2/5
BE 4/5
5 {A, B, C, E}
CE 3/5
ABE 2/5
ACE 2/5
BCE 3/5
ABCE 2/5
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support FR/In-Fr CL/NCL
Closed Itemset:
A 3/5 Freq NCL
An itemset X is Closed Itenmset if B 4/5 Freq NCL
1. X is Frequent C 4/5 Freq CL
2. No immediate superset of X has E 4/5 Freq NCL
same support as X AB 2/5 Freq NCL
[or If atleast one superset contains equal
AC 3/5 Freq CL
support of X then X is Not Closed].
AE 2/5 Freq NCL
BC 3/5 Freq NCL
TID Itemset
BE 4/5 Freq CL
1 {A, C, D}
CE 3/5 Freq NCL
2 {B, C, E} ABE 2/5 Freq NCL
3 {A, B, C, E} ACE 2/5 Freq NCL
4 {B, E} BCE 3/5 Freq CL
5 {A, B, C, E} ABCE 2/5 Freq CL
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support FR/In-Fr CL/NCL Max/NMax
Maximal Itemset: A 3/5 Freq NCL NMax
B 4/5 Freq NCL NMax
An itemset X is Maximal Itenmset if
C 4/5 Freq CL NMax
1. X is Frequent
2. No immediate superset of X is E 4/5 Freq NCL NMax
frequent. AB 2/5 Freq NCL NMax
AC 3/5 Freq CL NMax
AE 2/5 Freq NCL NMax
BC 3/5 Freq NCL NMax
TID Itemset
BE 4/5 Freq CL NMax
1 {A, C, D}
CE 3/5 Freq NCL NMax
2 {B, C, E} ABE 2/5 Freq NCL NMax
3 {A, B, C, E} ACE 2/5 Freq NCL NMax
4 {B, E} BCE 3/5 Freq NMax
CL
5 {A, B, C, E} ABCE 2/5 Freq MAX
CL
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support FR/In-Fr CL/NCL Max/NMax
According to given example
A 3/5 Freq NCL NMax
• Total Itemset: 31 (one set is Ø)
• Frequent Itemset: 15 B 4/5 Freq NCL NMax
• Closed Itemset: 5 C 4/5 Freq CL NMax
• Maximal Itemset: 1 E 4/5 Freq NCL NMax
The relation is M C F AB 2/5 Freq NCL NMax
AC 3/5 Freq CL NMax
AE 2/5 Freq NCL NMax
BC 3/5 Freq NCL NMax
BE 4/5 Freq CL NMax
CE 3/5 Freq NCL NMax
ABE 2/5 Freq NCL NMax
ACE 2/5 Freq NCL NMax
BCE 3/5 Freq NMax
CL
ABCE 2/5 Freq MAX
CL
Frequent Itemset, Closed Itemset and Maximal Itemset

Class Exercise
Q1: Find frequent temsets, closed itemsets and maximal itemsets from the following transaction table.
Illustrate it with step-by-step process.

TID Itemset
100 {I1, I3, I4}
200 {I2, I3, IE}
300 {I1, I2, I3, I5}
400 {I2, I5}
Which Patterns Are Interesting?

Pattern Evaluation Methods


Which Patterns Are Interesting?

• Most association rule mining algorithm employ a support-confidence


framework

• However when mining at low support thresholds or mining for long


patterns this association rule mining fail or proofs to be bottleneck
approach.

• Or Strong rules of this method are not necessarily interesting.


Strong Rules are not Necessarily Interesting
Consider the following transaction DB and find the association rule
Let min support = 30% & min confidence = 60%
Game Game Sum(row) Find association of (game->video) is strong or not?
Video 4000 3500 7500 Ans:
video 2000 500 2500 support(game+video) = 4000/10000= 40% > 30%
Sum(col) 6000 4000 10000 confidence(game->video)= sup(game+video)/sup(game)
= (4000/10000)/(6000/10000) = 40/60=66% > 60%
As support and confidence both are above the given threshold
=> rule (game->video) is one strong rule
However, purchasing video is 75% (7500/10000) which is more than computed 66%

=> In fact games and videos are negatively associated not strongly associated

Or all strong rule are not necessarily interesting


From Association Analysis to Correlation Analysis
• Association rules analysis is a technique to uncover how items are associated/correlated to each other.
Traditionally, association rule mining is performed by using two interestingness measures named the
support and confidence to evaluate rules.
• However, the drawback of the confidence measure is that it might misrepresent the importance of an
association. As discussed in last example. To account for the base popularity of both constituent items, a
third measure is used called lift / Intereset.
A=>B [support, confidence, correlation]

• Support: This says how popular an itemset is, as measured by the proportion of transactions.
Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X ->
Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

• Lift: The lift value of an association rule is the ratio of the confidence of the rule and the expected
confidence of the rule. This says how likely item Y is purchased when item X is purchased, while
controlling for how popular item Y is.
Correlation Analysis
Correlation Analysis says howmuch related two items are. [There are two
techniques to find correlation of objects [Lift and Chi-square Method]
• In probabilistics, two elements let A and B are independent:
if P(AՍ B) = P(A) × P(B)

Measure of dependent/correlated events: LIFT


Correlation Concepts
Lift (A->B) measures how much A and B are corelated.

• If Lift =1 ⇒ corr(A,B)=1: Occurrence of A is independent of the occurrence of B


i.e. there is no correlation between them
• If Lift >1 ⇒ corr(A,B)>1: A and B are positively correlated i.e. the occurrence of
one implies the occurrence of the other
• If Lift <1 ⇒ corr(A,B)<1: A and B are negatively correlated i.e. occurance of A
discourages the occurrence of B
Interestingness Measure: Correlations (Lift)

Game Game Sum(row)

Video 4000 3500 7500


video 2000 500 2500
Sum(col) 6000 4000 10000
Interestingness Measure: Correlations (Lift) contd..
Association Rule Mining Task

If d=6, R = 602 rules


Association Rule Mining Algorithims
There are many association rule mining algorithms that uses different strategies [computational
efficiency] and data-structures [memory requirement] but result same sets of rules.
or
Given a transaction data set T, and a minimum support and a minimum confidence, the set of
association rules existing in T is uniquely determined.

• Use pruning techniques to reduce the search space of M (i.e. 2d)

• As the size of item-set increases it reduces the number of transaction by DHP (Direct
Hashing & Purning) and vertical-based mining algorithms

• Use efficient data structures to store the candidates or transactions due to which it is not
required to match every candidate against every transaction
We will study only Apriori Algorithm

You might also like