Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
1
We want to find all strong rules. These are rules a → b such that:
Apriori finds all frequent itemsets (a such that Supp(a) ≥ θ). We can use
Apriori’s result to get all strong rules a → b as follows:
Now for Apriori. Use the downward closure property: generate all k-itemsets
(itemsets of size k) from (k − 1)-itemsets. It’s a breadth-first-search.
2
Example:
θ = 10
ries
s
ries
rb e r
ana
s
les
e
grap
cher
app
elde
1-itemsets: a ban
b c d e /f g
supp: 25 20 30 45 29 5 17
{a,b
2-itemsets: } {a,c} {a,d} {a,e} ... {e,g}
supp: 7 25 15 23 3
4-itemsets: {a,c,d,e}
supp: 12
Apriori Algorithm:
Input: Matrix M
L1 ={frequent 1-itemsets; i such that Supp(i) ≥ θ}.
For k = 2, while Lk−1 6= ∅ (while there are large k − 1-itemsets), k + +
• Ck = apriori gen(Lk−1 ) generate candidate itemsets of size k
• Lk = {c : c ∈ Ck , Supp(c) ≥ θ} frequent itemsets of size k (loop over
transactions, scan the database)
end
S
Output: k Lk .
3
The subroutine apriori gen joins Lk−1 to Lk−1 .
– Confidence
Supp(a ∪ b)
P̂ (b|a) =
Supp(a)
– “Lift”/“Interest”
P̂ (b|a) Supp(b)
= Supp(a∪b)
P̂ (b) 1− Supp(a)
:
– Hundreds!
Research questions:
• mining more than just itemsets (e.g, sequences, trees, graphs)
• incorporating taxonomy in items
• boolean logic and “logical analysis of data”
• Cynthia’s questions: Can we use rules within ML to get good predictive
models?
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.