04 AssociationPatternMining
04 AssociationPatternMining
DATA MINING
ASSOCIATION
PATTERN MINING
DR PAUL HANCOCK
CURTIN UNIVERSITY
SEMESTER 2
ASSOCIATION
PATTERN MINING
Associations and patterns
Aggarwal Ch 4.1-4.4.2, 4.4.4
Algorithms
Aggarwal Ch 5.2, 5.4
Applications
Summary
Database T
n transactions T1 , T2 , . . . , Tn
Proof:
If Sup(I) >= minsup and J is a subset of I
then Sup(J)>=Sup(I)>=minsup
Sup({Bread, Milk}) >= Sup({Bread, Eggs, Milk})
then Sup(J)>= minsup
Sup({Bread}) >= Sup({Bread, Milk})
=> J is also frequent
Corollary:
The number of frequent itemsets with k items
decreases with increasing k.
k-itemsets
k >= 1
More interested in k > 2 Once we have mined our data for frequent patterns,
Describes items that co-occur Describes the relation between different itemsets
Similar itemsets may have relations Written as X => Y with some confidence measure
Itemsets don't describe these relations "If a shopper buys eggs, then it's 45% likely
that they'll also buy cheese"
{Eggs} => {Cheese} with confidence 45%
{Eggs} {Cheese}
{Eggs,
Cheese}
{Eggs,
Cheese}
{Cheese}
{Eggs}
COMP5009 – DATA MINING, CURTIN UNIVERSITY 14
CONFIDENCE
Criteria 1 – relevance f
Criteria 2 – strength
Algorithm
Step 1: generate all candidate itemsets, 2|U| − 1
Impractical if d = |U| is large
Step 2: scan the database and count the number of
occurrences for eat itemset 21000 ~ 10301 is an awful lot of trials
Step 3: select itemsets with support ≥ minimum 1000 is not a large universe
support Probably suitable only for very small problems
Algorithm:
Create 1-itemsets with sup>=minsup
For N=2 … K do
Join N-item sets to make (N+1)-items sets with
sup>=minsup
https://developpaper.com/association-rule-mining-and-apriori-algorithm/
Lexicon:
Bread, Butter, Eggs, Milk, Yoghurt, Cheese
item sup
Bread 2
Butter 1 Order by support:
Eggs 3 Milk, Yoghurt, Eggs, Bread, Cheese, Butter
Milk 5
Yoghurt 3
Cheese 2
1
1
1 1
1
2 1
1
2 1 1
1
5
Let minsup = 3
No frequent itemsets with Butter, Bread, or Cheese
Apriori FP-Growth
Considers all possible candidates Only candidates with the DB are considered