Association Rule Mining
Association Rule Mining
Data Mining
Pinar Duygulu
1
Why?
• Retailers now have massive databases full of transactional history
• Simply transaction date and list of items
• Is it possible to gain insights from this data?
• How are items in a database associated
• Association Rules predict members of a set given other members in the set
Why?
• Example Rules:
• 98% of customers that purchase tires get automotive services done
• Customers which buy mustard and ketchup also buy burgers
• Goal: find these rules from just transactional data
• Rules help with: store layout, buying patterns, add-on sales, etc
Association rule mining
• Proposed by Agrawal et al in 1993.
• It is an important data mining model studied
extensively by the database and data mining
community.
• Assume all data are categorical.
• No good algorithm for numeric data.
• Initially used for Market Basket Analysis to find how
items purchased by customers are related.
4
The model: data
• I = {i1, i2, …, im}: a set of items.
• Transaction t :
• t a set of items, and t I.
• Transaction Database T: a set of transactions T = {t1,
t2, …, tn}.
5
Transaction data: supermarket data
• Market basket transactions:
t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}
… …
tn: {biscuit, eggs, milk}
• Concepts:
• An item: an item/article in a basket
• I: the set of all items sold in the store
• A transaction: items purchased in a basket; it may have TID
(transaction ID)
• A transactional dataset: A set of transactions
6
Slide from Bing Liu
Transaction data: a set of documents
• A text document data set. Each document is treated as
a “bag” of keywords
doc1: Student, Teach, School
doc2: Student, School
doc3: Teach, School, City, Game
doc4: Baseball, Basketball
doc5: Basketball, Player, Spectator
doc6: Baseball, Coach, Game, Team
doc7: Basketball, Team, City, Game
Market-Basket transactions
Example of Association Rules
TID Items
{Diaper} {Beer},
1 Bread, Milk {Milk, Bread} {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!
8
Applications – (1)
• Items = products; baskets = sets of products someone bought in one
trip to the store.
• Example application: given that many people buy beer and diapers
together:
• Run a sale on diapers; raise price of beer.
• Only useful if many buy diapers & beer.
9
Applications – (2)
• Baskets = sentences; items = documents containing those sentences.
• Items that appear together too often could represent plagiarism.
10
Applications – (3)
• Baskets = Web pages; items = words.
• Unusual words appearing together in a large number of documents,
e.g., “Brad” and “Angelina,” may indicate an interesting relationship.
11
Frequent Itemset
• Itemset
• A collection of one or more items
• Example: {Milk, Bread, Diaper}
• k-itemset
TID Items
• An itemset that contains k items
1 Bread, Milk
• Support count () 2 Bread, Diaper, Beer, Eggs
• Frequency of occurrence of an itemset 3 Milk, Diaper, Beer, Coke
• E.g. ({Milk, Bread,Diaper}) = 2 4 Bread, Milk, Diaper, Beer
• Support 5 Bread, Milk, Diaper, Coke
• Fraction of transactions that contain an
itemset
• E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
• An itemset whose support is greater
than or equal to a minsup threshold
Definition: Association Rule
Association Rule
TID Items
– An implication expression of the form X
1 Bread, Milk
Y, where X and Y are itemsets
2 Bread, Diaper, Beer, Eggs
– Example:
3 Milk, Diaper, Beer, Coke
{Milk, Diaper} {Beer}
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Rule Evaluation Metrics
– Support (s)
Fraction of transactions that contain both Example:
X and Y {Milk, Diaper} Beer
– Confidence (c)
(Milk , Diaper, Beer ) 2
Measures how often items in Y
appear in transactions that
s 0.4
|T| 5
contain X
(Milk, Diaper, Beer ) 2
c 0.67
(Milk , Diaper ) 3
15
Support and Confidence
• Support is important because
• A rule that has a low support may occur simply by chance
• A low support rule also is likely to be uninteresting from a business
perspective because it may not be profitable
• Confidence measures the reliability of the rule
16
Association Rule Mining Task
• Given a set of transactions T, the goal of association rule mining is to
find all rules having
• support ≥ minsup threshold
• confidence ≥ minconf threshold
• Brute-force approach:
• List all possible association rules
• Compute the support and confidence for each rule
• Prune rules that fail the minsup and minconf thresholds
Computationally prohibitive!
17
Mining Association Rules
Example of Rules:
TID Items
1 Bread, Milk {Milk,Diaper} {Beer} (s=0.4, c=0.67)
2 Bread, Diaper, Beer, Eggs {Milk,Beer} {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} {Milk} (s=0.4, c=0.67)
3 Milk, Diaper, Beer, Coke
{Beer} {Milk,Diaper} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer
{Diaper} {Milk,Beer} (s=0.4, c=0.5)
5 Bread, Milk, Diaper, Coke {Milk} {Diaper,Beer} (s=0.4, c=0.5)
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
18
Mining Association Rules
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support minsup
2. Rule Generation
– Generate high confidence rules from each frequent itemset, where each rule is a binary
partitioning of a frequent itemset
19
Frequent Itemset Generation
null
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w
• Match each transaction against every candidate
• Complexity ~ O(NMw) => Expensive since M = 2d !!!
21
Computational
• Given d uniqueComplexity
items:
• Total number of itemsets = 2d
• Total number of possible association rules:
d d k
R
d 1 d k
k j
k 1 j 1
3 2 1
d d 1
22
Frequent Itemset Generation Strategies
• Reduce the number of candidates (M)
• Complete search: M=2d
• Use pruning techniques to reduce M
23
Reducing Number of Candidates
• Apriori principle:
• If an itemset is frequent, then all of its subsets must also be
frequent
• In other words, if an itemset is infrequent, all of its supersets
must also be infrequent
X ,Y : ( X Y ) s( X ) s(Y )
• Support of an itemset never exceeds the support of its subsets
• This is known as the anti-monotone property of support
24
Illustrating Apriori Principle null
A B C D E
AB AC AD AE BC BD BE CD CE DE
Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Pruned
ABCDE
supersets
25
Illustrating Apriori Principle
Item Count Items (1-itemsets)
Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 (No need to generate
{Bread,Beer} 2
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
26
Apriori Algorithm
• Method:
• Let k=1
• Generate frequent itemsets of length 1
• Repeat until no new frequent itemsets are identified
• Generate length (k+1) candidate itemsets from length k frequent
itemsets
• Prune candidate itemsets containing subsets of length k that are
infrequent
• Count the support of each candidate by scanning the DB
• Eliminate candidates that are infrequent, leaving only those that
are frequent
27
The Apriori Algorithm (Pseudo-Code)
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
28
The Apriori Algorithm—An Example
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
30
Implementation of Apriori
• How to generate candidates?
• Step 1: self-joining Lk
• Step 2: pruning
31
Example of Candidates Generation
• Assume the items in Lk are listed in an order
(e.g., alphabetical)
• L3={abc, abd, acd, ace, bcd} {a,c,d} {a,c,e}
• Self-joining: L3*L3
{a,c,d,e}
– abcd from abc and abd
– acde from acd and ace acd ace ade cde
• C4={abcd}
33
Brute-force method for generating candidates
34
F(k-1)xF(1)
35
F(k-1)xF(k-1)
36
Further Improvement of the Apriori
Method
• Major computational challenges
• Multiple scans of transaction database
• Huge number of candidates
• Tedious workload of support counting for candidates
37
Reducing Number of
• Candidate counting:
Comparisons
• Scan the database of transactions to determine the support
of each candidate itemset
• To reduce the number of comparisons, store the candidates
in a hash structure
• Instead of matching each transaction against every candidate,
match it against candidates contained in the hashed buckets
39
Subset Operation – Support Counting
Given a transaction t, what are
the possible subsets of size 3?
Transaction, t
1 2 3 5 6
Level 1
1 2 3 5 6 2 3 5 6 3 5 6
Level 2
12 3 5 6 13 5 6 15 6 23 5 6 25 6 35 6
123
135 235
125 156 256 356
136 236
126
1+ 2356
2+ 356 1,4,7 3,6,9
2,5,8
3+ 56
234
567
145 136
345 356 367
357 368
124 159 689
125
457 458
42
Subset Operation Using Hash Tree
Hash Function
1 2 3 5 6 transaction
1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356
2,5,8
3+ 56
13+ 56
234
15+ 6 567
145 136
345 356 367
357 368
124 159 689
125
457 458
43
Subset Operation Using Hash Tree
Hash Function
1 2 3 5 6 transaction
1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356
2,5,8
3+ 56
13+ 56
234
15+ 6 567
145 136
345 356 367
357 368
124 159 689
125
457 458
Match transaction against 11 out of 15 candidates
44
Factors Affecting Complexity
• Choice of minimum support threshold
• lowering support threshold results in more frequent itemsets
• this may increase number of candidates and max length of frequent
itemsets
• Dimensionality (number of items) of the data set
• more space is needed to store support count of each item
• if number of frequent items also increases, both computation and I/O
costs may also increase
• Size of database
• Since Apriori makes multiple passes, run time of algorithm may increase
with number of transactions
• Average transaction width
• transaction width increases with denser data sets
• This may increase max length of frequent itemsets and traversals of hash
tree (number of subsets in a transaction increases with its width)
45
Compact Representation of Frequent
Itemsets
• Some itemsets are redundant because they have identical support as their
10 10
supersets 3
k
k 1
• It is useful to identify a small representative set of itemsets from which all other
frequent itemsets can be derived
46
Maximal Frequent Itemset
An itemset is maximal frequent if none of its immediate supersets is frequent
null
Maximal A B C D E
Itemsets
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Infrequen
t Itemsets ABCD Border
E
47
Maximal Frequent Itemsets
• They form the smallest set of itemsets from which all frequent
itemsets can be derived
48
Closed Itemset
• Provide a minimal representation without losing their support
information
• An itemset is closed if none of its immediate supersets has the same
support as the itemset
49
Maximal vs Closed Itemsets
null Transaction
TID Items
Ids
1 ABC 124 123 1234 245 345
A B C D E
2 ABCD
3 BCE
4 ACDE 12 124 24 4 123 2 3 24 34 45
AB AC AD AE BC BD BE CD CE DE
5 DE
12 2 24 4 4 2 3 4
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
2 4
ABCD ABCE ABDE ACDE BCDE
Not supported
by any ABCDE
transactions 50
Maximal vs Closed Frequent Itemsets
Minimum support = 2 null Closed but not
maximal
124 123 1234 245 345
A B C D E
Closed and
maximal
12 124 24 4 123 2 3 24 34 45
AB AC AD AE BC BD BE CD CE DE
12 2 24 4 4 2 3 4
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
2 4
ABCD ABCE ABDE ACDE BCDE
# Closed = 9
# Maximal = 4
ABCDE
51
Why are closed patterns interesting?
53
Slide from EviMaria Terzi
Maximal vs Closed Itemsets
Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets
54
Alternative Algorithm – FP
growth
FP-Growth: Frequent Pattern-Growth
If FP-tree is small enough to fit the memory, this will allow to extract frequent
itemsets directly in memory
56
Example: FP-Growth
The first scan of data is the same as Transactional Database
Apriori TID List of item IDS
Derive the set of frequent 1- T100 I1,I2,I5
itemsets T200 I2,I4
Let min-sup=2 T300 I2,I3
Generate a set of ordered items T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
Item ID Support
count T700 I1,I3
I2 7 T800 I1,I2,I3,I5
I1 6 T900 I1,I2,I3
I3 6
I4 2
I5 2
57
Construct the FP-Tree
Transactional Database
TID Items TID Items TID Items
T100 I1,I2,I5 T400 I1,I2,I4 T700 I1,I3
T200 I2,I4 T500 I1,I3 T800 I1,I2,I3,I5
T300 I2,I3 T600 I2,I3 T900 I1,I2,I3
58
Construct the FP-Tree
Transactional Database
TID Items TID Items TID Items
T100 I1,I2,I5 T400 I1,I2,I4 T700 I1,I3
T200 I2,I4 T500 I1,I3 T800 I1,I2,I3,I5
T300 I2,I3 T600 I2,I3 T900 I1,I2,I3
59
Construct the FP-Tree
Transactional Database
TID Items TID Items TID Items
T100 I1,I2,I5 T400 I1,I2,I4 T700 I1,I3
T200 I2,I4 T500 I1,I3 T800 I1,I2,I3,I5
T300 I2,I3 T600 I2,I3 T900 I1,I2,I3
60
Construct the FP-Tree
Transactional Database
TID Items TID Items TID Items
T100 I1,I2,I5 T400 I1,I2,I4 T700 I1,I3
T200 I2,I4 T500 I1,I3 T800 I1,I2,I3,I5
T300 I2,I3 T600 I2,I3 T900 I1,I2,I3
null
I2:7 I1:2
Item ID Support
count
I2 7 I1:4 I4:1
I1 6 I3:2 I3:2
I3 6 I5:1
I4 2
I4:1
I5 2 When a branch of a
I3:2 transaction is added, the
count for each node
along a common prefix is
I5:1 incremented by 1 63
Construct the FP-Tree
null
I2:7 I1:2
Item ID Support
count
I2 7
I1:4 I4:1
I1 6 I3:2 I3:2
I3 6 I5:1
I4 2
I4:1
I5 2
I3:2
I5:1
64
Construct the FP-Tree
null
I2:7 I1:2
Item ID Support
count
I2 7
I1:4 I4:1
I1 6 I3:2 I3:2
I3 6 I5:1
I4 2
I4:1
I5 2
I3:2
I5:1
I5:1
66
Construct the FP-Tree
null
I2:7 I1:2
Item ID Support
count
I2 7 I1:4 I4:1
I1 6 I3:2 I3:2
I3 6 I5:1
I4 2
I4:1
I5 2
I3:2
I5:1
67
FP-growth properties
If the tree does not fit into main memory, partition the database
Efficient and scalable for mining both long and short frequent patterns
68
Mining Association Rules
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support minsup
2. Rule Generation
– Generate high confidence rules from each frequent itemset, where each rule is a binary
partitioning of a frequent itemset
69
Re-Definition: Association Rule
Let D be database of transactions
– e.g.: Transaction ID Items
2000 A, B, C
1000 A, C
4000 A, D
5000 B, E, F
70
Generating Association Rules
Once the frequent itemsets have been found, it is straightforward to generate
strong association rules that satisfy:
minimum Support
minimum confidence
support_count(AB)
Confidence(AB) = P(B|A)=
support_count(A)
71
Generating Association Rules
S (L-S)
72
Example
Suppose the frequent Itemset
L={I1,I2,I5} Transactional Database
Subsets of L are: {I1,I2},
TID List of item IDS
{I1,I5},{I2,I5},{I1},{I2},{I5}
T100 I1,I2,I5
Association rules :
T200 I2,I4
I1 I2 I5 confidence = 2/4= 50% T300 I2,I3
I1 I5 I2 confidence=2/2=100% T400 I1,I2,I4
I2 I5 I1 confidence=2/2=100% T500 I1,I3
I1 I2 I5 confidence=2/6=33% T600 I2,I3
I2 I1 I5 confidence=2/7=29% T700 I1,I3
I5 I2 I2 confidence=2/2=100% T800 I1,I2,I3,I5
T900 I1,I2,I3
If the minimum confidence =70%
73
Rule Generation
• Given a frequent itemset L, find all non-empty subsets f L such that
f L – f satisfies the minimum confidence requirement
• If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC BD, AD BC, BC AD,
BD AC, CD AB,
74
Rule Generation
• How to efficiently generate rules from frequent itemsets?
• In general, confidence does not have an anti-monotone property
c(ABC D) can be larger or smaller than c(AB D)
• But confidence of rules generated from the same itemset has an anti-
monotone property
• e.g., L = {A,B,C,D}:
75
Rule Generation for Apriori Algorithm
Lattice of rules
ABCD=>{ }
Low
Confidence
Rule
BCD=>A ACD=>B ABD=>C ABC=>D
76
Rule Generation for Apriori Algorithm
• Candidate rule is generated by merging two rules that share the same
prefix
in the rule consequent CD=>AB BD=>AC
• join(CD=>AB,BD=>AC)
would produce the candidate
rule D => ABC
D=>ABC
• Prune rule D=>ABC if its
subset AD=>BC does not have
high confidence
77
Problems with the association mining
• Single minsup: It assumes that all items in the data
are of the same nature and/or have similar
frequencies.
• Not true: In many applications, some items appear
very frequently in the data, while others rarely
appear.
E.g., in a supermarket, people buy food processor and
cooking pan much less frequently than they buy bread and
milk.
78
Effect of Support Distribution
• Many real data sets have skewed support distribution
Support
distribution of
a retail data set
79
Rare Item Problem
• If the frequencies of items vary a great deal, we will
encounter two problems
• If minsup is set too high, those rules that involve rare items
will not be found.
• To find rules that involve both frequent and rare items,
minsup has to be set very low. This may cause
combinatorial explosion because those frequent items will
be associated with one another in all possible ways.
• Using a single minimum support threshold may not be
effective
80
Multiple minsups model
81
Minsup of a rule
82
An Example
• Consider the following items:
bread, shoes, clothes
The user-specified MIS values are as follows:
MIS(bread) = 2% MIS(shoes) = 0.1%
MIS(clothes) = 0.2%
The following rule doesn’t satisfy its minsup:
clothes bread [sup=0.15%,conf =70%]
The following rule satisfies its minsup:
clothes shoes [sup=0.15%,conf =70%]
83
Pattern Evaluation
• Association rule algorithms tend to produce too many
rules
• many of them are uninteresting or redundant
• Redundant if {A,B,C} {D} and {A,B} {D}
have same support & confidence
84
Application of Interestingness Measure
Interestingness
Measures
85
Computing Interestingness Measure
• Given a rule X Y, information needed to compute rule
interestingness can be obtained from a contingency table
86
Drawback of Confidence
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
87
Statistical-based Measures
• Measures that take into account statistical dependence
P(Y | X )
Lift
P(Y )
P( X , Y )
Interest
P( X ) P(Y )
PS P( X , Y ) P( X ) P(Y )
P( X , Y ) P( X ) P(Y )
coefficient
P( X )[1 P( X )]P(Y )[1 P(Y )]
88
Example: Lift/Interest
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
89
Subjective Interestingness Measure
• Objective measure:
• Rank patterns based on statistics computed from data
• e.g., 21 measures of association (support, confidence, Laplace,
Gini, mutual information, Jaccard, etc).
• Subjective measure:
• Rank patterns according to user’s interpretation
• A pattern is subjectively interesting if it contradicts the
expectation of a user (Silberschatz & Tuzhilin)
• A pattern is subjectively interesting if it is actionable
(Silberschatz & Tuzhilin)
90
Interestingness via Unexpectedness
• Need to model expectation of users (domain knowledge)
+ - Expected Patterns
- + Unexpected Patterns
93
Association Rule Discovery: Hash tree
Hash Function Candidate Hash Tree
1,4,7 3,6,9
2,5,8
234
567
145 136
345 356 367
Hash on
357 368
1, 4 or 7
124 159 689
125
457 458
94
Association Rule Discovery: Hash tree
Hash Function Candidate Hash Tree
1,4,7 3,6,9
2,5,8
234
567
145 136
345 356 367
Hash on
357 368
2, 5 or 8
124 159 689
125
457 458
95
Association Rule Discovery: Hash tree
Hash Function Candidate Hash Tree
1,4,7 3,6,9
2,5,8
234
567
145 136
345 356 367
Hash on
357 368
3, 6 or 9
124 159 689
125
457 458
96
FP-growth Algorithm
• Use a compressed representation of the database using an FP-tree
97
FP-tree construction null
After reading TID=1:
A:1
TID Items
1 {A,B}
2 {B,C,D} B:1
3 {A,C,D,E}
4 {A,D,E} After reading TID=2:
5 {A,B,C} null
6 {A,B,C,D} B:1
A:1
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:1 C:1
10 {B,C,E}
D:1
98
FP-Tree Construction
TID Items
Transaction
1 {A,B}
2 {B,C,D}
Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E}
C:1 D:1
99
FP-growth
Conditional Pattern base for
null
D:
P = {(A:1,B:1,C:1),
A:7 B:1 (A:1,B:1),
(A:1,C:1),
(A:1),
B:5 C:1 (B:1,C:1)}
C:1 D:1
Recursively apply FP-growth
D:1 on P
C:3
D:1
D:1 Frequent Itemsets found (with
sup > 1):
D:1 AD, BD, CD, ACD, BCD
100