BD25
BD25
ALGORITHM
Motivation: Association Rule Mining
• Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction
Market-Basket transactions
Example of Association Rules
TID Items
{Diaper} {Beer},
1 Bread, Milk {Milk, Bread} {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Applications: Association Rule Mining
• * Maintenance Agreement
– What the store should do to boost Maintenance
Agreement sales
• Home Electronics *
– What other products should the store stocks up?
• Attached mailing in direct marketing
• Detecting “ping-ponging” of patients
• Marketing and Sales Promotion
• Supermarket shelf management
Definition: Frequent Itemset
• Itemset
– A collection of one or more items
•Example: {Milk, Bread, Diaper}
– k-itemset
•An itemset that contains k items TID Items
• Support count () 1 Bread, Milk
– Frequency of occurrence of an itemset 2 Bread, Diaper, Beer, Eggs
– E.g. ({Milk, Bread,Diaper}) = 2 3 Milk, Diaper, Beer, Coke
• Support 4 Bread, Milk, Diaper, Beer
– Fraction of transactions that contain an 5 Bread, Milk, Diaper, Coke
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
– An itemset whose support is greater
than or equal to a minsup threshold
Definition: Association Rule
• Association Rule TID Items
X and Y
– Confidence (c) (
Milk
,
DiapBee
)2
•Measures how often items in Y
s
0
.4
|T| 5
appear in transactions that
contain X
(Milk,
DiapBee
)2
c 0
.
6
(Milk
,Diap
) 3
Association Rule Mining Task
• Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
Computationally prohibitive!
Computational Complexity
• Given d unique items:
– Total number of itemsets = 2d
– Total number of possible association rules:
d dk
k
d
1 d
k
R
j
k
1 j
1
3 2d
1 d
1
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support minsup
2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset
A B C D E
AB AC AD AE BC BD BE CD CE DE
Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Pruned
supersets
ABCDE
Illustrating Apriori Principle
1+ 2356
2+ 356
12+ 356
3+ 56
13+ 56
234
15+ 6 567
145 136
345 356 367
357 368
124 159 689
125
457 458
Match transaction against 11 out of 15 candidates
REFERENCES :