0% found this document useful (0 votes)
42 views6 pages

Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5

The Apriori algorithm is used to find frequent itemsets and association rules in transactional databases. It works by identifying frequent itemsets, or itemsets that appear together in a transaction with frequency greater than a minimum support threshold. The algorithm scans the database multiple times and uses the downward closure property to prune the search space. It generates candidate itemsets of length k from frequent itemsets of length k-1 and determines which are frequent by counting their support. The algorithm outputs all frequent itemsets and association rules that meet minimum support and confidence thresholds.

Uploaded by

alok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views6 pages

Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5

The Apriori algorithm is used to find frequent itemsets and association rules in transactional databases. It works by identifying frequent itemsets, or itemsets that appear together in a transaction with frequency greater than a minimum support threshold. The algorithm scans the database multiple times and uses the downward closure property to prune the search space. It generates candidate itemsets of length k from frequent itemsets of length k-1 and determines which are frequent by counting their support. The algorithm outputs all frequent itemsets and association rules that meet minimum support and confidence thresholds.

Uploaded by

alok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Rule Mining and the Apriori Algorithm

MIT 15.097 Course Notes


Cynthia Rudin
The Apriori algorithm - often called the “first thing data miners try,” but some-
how doesn’t appear in most data mining textbooks or courses!

Start with market basket data:

Some important definitions:


• Itemset: a subset of items, e.g., (bananas, cherries, elderberries), indexed
by {2, 3, 5}.
• Support of an itemset: number of transactions containing it,
m
X
Supp(bananas, cherries, elderberries) = Mi,2 · Mi,3 · Mi,5 .
i=1

• Confidence of rule a → b: the fraction of times itemset b is purchased


when itemset a is purchased.
Supp(a ∪ b) #times a and b are purchased
Conf(a → b) = =
Supp(a) #times a is purchased
= Pˆ (b|a).

1
We want to find all strong rules. These are rules a → b such that:

Supp(a ∪ b) ≥ θ, and Conf(a → b) ≥ minconf.

Here θ is called the minimum support threshold.

The support has a monotonicity property called downward closure:

If Supp(a ∪ b) ≥ θ then Supp(a) ≥ θ and Supp(b) ≥ θ.

That is, if a ∪ b is a frequent item set, then so are a and b.

Supp(a ∪ b) = #times a and b are purchased


≤ #times a is purchased = Supp(a).

Apriori finds all frequent itemsets (a such that Supp(a) ≥ θ). We can use
Apriori’s result to get all strong rules a → b as follows:

• For each frequent itemset `:


– Find all nonempty subsets of `
– For each subset a, output a → {` \ a} whenever
Supp(`)
≥ minconf.
Supp(a)

Now for Apriori. Use the downward closure property: generate all k-itemsets
(itemsets of size k) from (k − 1)-itemsets. It’s a breadth-first-search.

2
Example:
θ = 10

ries
s

ries

rb e r
ana

s
les

e
grap
cher
app

elde
1-itemsets: a ban
b c d e /f g
supp: 25 20 30 45 29 5 17

{a,b
2-itemsets: } {a,c} {a,d} {a,e} ... {e,g}
 
 
supp: 7 25 15 23 3

3-itemsets: {a,c,d} {a,c,e} {b,d,g} ...


supp: 15 22 15

4-itemsets: {a,c,d,e}
supp: 12

Apriori Algorithm:
Input: Matrix M
L1 ={frequent 1-itemsets; i such that Supp(i) ≥ θ}.
For k = 2, while Lk−1 6= ∅ (while there are large k − 1-itemsets), k + +
• Ck = apriori gen(Lk−1 ) generate candidate itemsets of size k
• Lk = {c : c ∈ Ck , Supp(c) ≥ θ} frequent itemsets of size k (loop over
transactions, scan the database)
end
S
Output: k Lk .

3
The subroutine apriori gen joins Lk−1 to Lk−1 .

apriori gen Subroutine:


Input: Lk−1
Find all pairs of itemsets in Lk−1 where the first k − 2 items are identical.
Union them (lexicographically) to get Cktoo big ,

e.g.,{a, b, c, d, e, f }, {a, b, c, d, e, g} → {a, b, c, d, e, f, g}

Prune: Ck = {c ∈ Cktoo big , all (k − 1)-subsets cs of c obey cs ∈ Lk−1 }.


Output: Ck .
Example of Prune step: consider {a, b, c, d, e, f, g} which is in Cktoo big , and I want
to know whether it’s in Ck . Look at { a, b, c, d, e, f, g}, {a, b, c, d, e, f, g},{a, b, c, d, e, f, g},
{a, b, c, d, e, f, g}, etc. If any are not in L6 , then prune {a, b, c, d, e, f, g} from L7 .

The Apriori Algorithm


Database D
TID Items C1 Item Set Sup. L1 Item Set Sup.
100 134 {1} 2 {1} 2
Scan D
200 235 {2} 3 {2} 3
300 1235 {3} 3 {3} 3
400 25 {4} 1 {5} 3
{5} 3

L2 Item Set Sup. C2 Item Set Sup. C2 Item Set


{1 3} 2 {1 2} 1 {1 2}
Scan D
{2 3} 2 {1 3} 2 {1 3}
{2 5} 3 {1 5} 1 {1 5}
{3 5} 2 {2 3} 2 {2 3}
{2 5} 3 {2 5}
{3 5} 2 {3 5}
C3 Item Set L3 Item Set Sup.
{2 3 5} Scan D {2 3 5} 2

Note: {1,2,3} {1,2,5} and {1,3,5} not in C3.

Image by MIT OpenCourseWare, adapted from Osmar R. Zaïane.


4

• Apriori scans the database at most how many times?


• Huge number of candidate sets. /
• Spawned huge number of apriori-like papers.

What do you do with the rules after they’re generated?


• Information overload (give up)
• Order rules by “interestingness”

– Confidence
Supp(a ∪ b)
P̂ (b|a) =
Supp(a)
– “Lift”/“Interest”
P̂ (b|a) Supp(b)
= Supp(a∪b)
P̂ (b) 1− Supp(a)
:

– Hundreds!
Research questions:
• mining more than just itemsets (e.g, sequences, trees, graphs)
• incorporating taxonomy in items
• boolean logic and “logical analysis of data”
• Cynthia’s questions: Can we use rules within ML to get good predictive
models?

5
MIT OpenCourseWare
http://ocw.mit.edu

15.097 Prediction: Machine Learning and Statistics


Spring 2012

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like