0% found this document useful (0 votes)
33 views28 pages

Apriori

The document summarizes the Apriori algorithm for learning association rules from transactional data. The Apriori algorithm is a two-phase process: 1) Find all frequent itemsets that occur above a minimum support threshold, exploiting the downward closure property. 2) Generate rules from itemsets where confidence is above a minimum, also using downward closure. It works by iteratively generating candidate itemsets of increasing size from large itemsets, and pruning candidates with subsets not found to be frequent.

Uploaded by

Eden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views28 pages

Apriori

The document summarizes the Apriori algorithm for learning association rules from transactional data. The Apriori algorithm is a two-phase process: 1) Find all frequent itemsets that occur above a minimum support threshold, exploiting the downward closure property. 2) Generate rules from itemsets where confidence is above a minimum, also using downward closure. It works by iteratively generating candidate itemsets of increasing size from large itemsets, and pruning candidates with subsets not found to be frequent.

Uploaded by

Eden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

The Apriori Algorithm

Association rule learning,


the Apriori algorithm and
it’s implementation

tommyod @ github

Presentation: github.com/tommyod/Efficient-Apriori/blob/master/docs/presentation/apriori.pdf

December 28, 2018

1 / 28
Table of contents

A problem: learning association rules

A solution: the Apriori algorithm

A practical matter: writing a Python implementation

Summary and references

2 / 28
A problem: learning association rules

3 / 28
Motivating example
Example (Learning from transactions)
Consider the following set of transactions.

{eggs, bread, jam, bacon}


{apples, eggs, bacon}
{bacon, bread}
{ice cream, bread, bacon}

What interesting information can we infer from this data?


Examples:
• The itemsets {bacon, bread} and {bacon, eggs} often appear in the
transactions, with counts 3 and 2, respectively.
• The rule {bread} ⇒ {bacon} is meaningful in the sense that
P(bacon|bread) = 1.
4 / 28
Formal problem statement

Problem
Given a database T = {t1 , t2 , . . . , tm }, where the ti are transactions, and a set of
items I = {i1 , i2 , . . . , in }, learn meaningful rules X ⇒ Y , where X , Y ⊂ I .

To accomplish this, we need measures of the meaningfulness of association rules.

5 / 28
Properties of association rules

Definition (Support)
The support of an association rule X ⇒ Y is the frequency of which X ∪ Y
appears in the transactions T , i.e. support(X ⇒ Y ) := P(X , Y ).

• No reason to distinguish between the support of an itemset, and the


support of an association rule, i.e. support(X ⇒ Y ) = support(X ∪ Y ).
• An important property of support is that
support({eggs, bacon}) ≤ support({bacon}).

More formally, we observe that:

Theorem (Downward closure property of sets)


If s ⊂ S, then support(s) ≥ support(S).

6 / 28
Properties of association rules
Definition (Confidence)
The confidence of the association rule X ⇒ Y is given by

P(X , Y ) support(X ⇒ Y )
confidence(X ⇒ Y ) = P(Y |X ) = = .
P(X ) support(X )

Notice the following interesting property.


Example
The confidence of {A, B} ⇒ {C } will always be greater than, or equal to,
{A} ⇒ {B, C }. By definition we have

support({A, B} ⇒ {C }) support({A} ⇒ {B, C })


≥ ,
support({A, B}) support({A})

where the numerator is identical, and support({A}) ≥ support({A, B})


7 / 28
Properties of association rules
Definition (Confidence)
The confidence of the association rule X ⇒ Y is given by

P(X , Y ) support(X ⇒ Y )
confidence(X ⇒ Y ) = P(Y |X ) = = .
P(X ) support(X )

Theorem (Downward closure property of rules)


Consider the rule (X − y ) ⇒ y and (X − Y ) ⇒ Y , where y ⊂ Y . Then

confidence ((X − y ) ⇒ y ) ≥ confidence ((X − Y ) ⇒ Y )

Proof. The numerator is identical, but the denominator has


support(X − y ) ≤ support(X − Y ) by the downward closure property of sets.
8 / 28
Examples of support and confidence

Example (Support and confidence of a rule)


Consider again the following set of transactions.

{eggs, bread, jam, bacon}


{apples, eggs, bacon}
{bacon, bread}
{ice cream, bread, bacon}
• The rule {bread} ⇒ {bacon} has support 3/4, confidence 1.
– Support 3/4 since {bread, bacon} appears in 3 of the transactions.
– Confidence 1 since {bread} appears 3 times, and in 3 of those
{bacon} also appears.

9 / 28
A naive algorithm

Example (Naive algorithm for learning rules)


for subsets of every size k = 1, . . . , |I |
for every subset of size k
for every split of this subset into {X } ⇒ {Y }
compute support and confidence of the rule
by counting the support in the transactions
• Fantastic staring point for an algorithm, since it (1) clearly terminates in
finite time, (2) is simple to implement and (3) will run reasonably fast on
small problem instances.
• Terribly slow on realistic problem instances, since it must check every
possible itemset against every transaction.

10 / 28
A solution: the Apriori algorithm

11 / 28
Overview of apriori
• Split the problem into two distinct phases.
– Finding meaningful (high support) itemsets.
– Generating meaningful (high confidence) rules.
• Phase 1
– The user specifies a desired minimum support.
– The algorithm exploits the downward closure property, i.e.
support(S) ≤ support(s) if s ⊂ S.
∗ No reason to check S if s has low support.
– Bottom-up approach to subset generation.
• Phase 2
– The user specifies a desired minimum confidence.
– Also exploits the above downward closure property.
– Bottom-up approach to rule generation.

12 / 28
Phase 1: Generating itemsets (example 1)
Example (Itemset generation via Apriori)
Consider again the following set of transactions.
{eggs, bread, jam, bacon}
{apples, eggs, bacon}
{bacon, bread}
{ice cream, bread, bacon}
• We set the minimum confidence to 50 %.
– Itemsets of size 1 with desired confidence are
{bacon}, {bread} and {eggs}. They are called large itemsets of size 1.
– From these, we can form
{bacon, bread}, {bacon, eggs} and {bread, eggs}. These are
candidate itemsets of size 2.
– Large itemsets of size 2: {bacon, bread} and {bacon, eggs}.
13 / 28
Phase 1: Generating itemsets (example 2)

Example

Iteration 1
Transactions
• Running the algorithm with minimum support 50 %.
{1, 2, 7, 4}
• Candidate itemsets of size 1:
{2, 3, 4}
– {1}, {2}, {3}, {4}, {5}, {6}, {7}
{1, 6, 3} • Large itemsets of size 1:
{1, 2, 4, 5} – {1}, {2}, {3}, {4}

14 / 28
Phase 1: Generating itemsets (example 2)

Example

Iteration 2
Transactions
• Running the algorithm with minimum support 50 %.
{1, 2, 7, 4}
• Candidate itemsets of size 2:
{2, 3, 4}
– {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}
{1, 6, 3} • Large itemsets of size 2:
{1, 2, 4, 5} – {1, 2}, {1, 4}, {2, 4}

15 / 28
Phase 1: Generating itemsets (example 2)

Example

Iteration 3
Transactions
• Running the algorithm with minimum support 50 %.
{1, 2, 7, 4}
• Candidate itemsets of size 3:
{2, 3, 4}
– {1, 2, 4}
{1, 6, 3} • Large itemsets of size 3:
{1, 2, 4, 5} – {1, 2, 4}

16 / 28
Phase 1: Pseudocode
Algorithm sketch
Create L1 , a set of large itemsets of size 1

j =1
while Lj is not empty do:
create every candidate set Cj+1 from Lj
prune candidates a priori Cj+1 (every subset must be in Lj )

for every transaction ti ∈ T do:


count occurrences of every set in Cj+1 in ti

j =j +1
Iterating through the transactions checking for every possible candidate in Cj+1 is
expensive. Optimizations: choosing good data structures, pruning transactions.

17 / 28
Phase 1: Pseudocode - Details on candidates and pruning

create every candidate set Cj+1 from Lj


prune candidates a priori Cj+1 (every subset must be in Lj )
Example Given large itemsets of size 3
{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}.
• Naive candidates are
{2, 3, 4, 5}, {1, 3, 4, 5}, {1, 2, 4, 5}, {1, 2, 3, 5}, {1, 2, 3, 4}.
• Apriori-gen candidates are {1, 2, 3, 4}, {1, 3, 4, 5}. Generated efficiently by
keeping the itemsets sorted.
• While the itemset {1, 2, 3, 4} is kept, {1, 3, 4, 5} is discarded since the
subset {1, 3, 5} ⊂ {1, 3, 4, 5} is not among the large itemsets of size 3 .

The example above is from page 4 in the referenced paper.

18 / 28
Phase 1: Pseudocode - Details on counting occurences

for every transaction ti ∈ T do:


count occurrences of every set in Cj+1 in ti
Example
Check if A = {1, 3, 7} is a subset of B = {1, 2, 3, 5, 7, 9}.
• A naive computation checks if every element of A is found in B. This has
computational complexity O(|A||B|), where |A| is the size of A.
• A better approach is to use binary search when B is sorted. The
computational complexity becomes O(|A| log2 |B|).
• Using hash tables (e.g. the built-in set.issubset in Python), the
computational complexity is down to O(|A|).
For the given example, this resolves to approximately 18, 8 and 3 operations.

19 / 28
Phase 2: Building association rules (example)
• In practice this step is much faster than Phase 1.
• The efficient algorithm exploits the downward closure property.
Example
Consider rules made from ABCD. First the algorithm tries to move itemsets of
size 1 to the right hand side, i.e. one of {{A}, {B}, {C }, {D}}.

BCD ⇒ A ACD ⇒ B
ABD ⇒ C ABC ⇒ D

Assume that only ABC ⇒ D and ACD ⇒ B had high enough confidence. Then
the only rule created from ABCD with a size 2 itemset on the right hand side
worth considering is AC ⇒ BD. This is a direct result of the downward closure
property.
Recursive function which is not very easy to explain in detail.
20 / 28
The Apriori algorithm on real data

Consider the following data set, with 32.561 rows.

Education Marital-status Relationship Race Sex Income Age


Bachelors Never-married Not-in-family White Male ≤50K middle-aged
Bachelors Married-civ-spouse Husband White Male ≤50K old
HS-grad Divorced Not-in-family White Male ≤50K middle-aged
11th Married-civ-spouse Husband Black Male ≤50K old
Bachelors Married-civ-spouse Wife Black Female ≤50K young
.. .. .. .. .. .. ..
. . . . . . .
Masters Married-civ-spouse Wife White Female ≤50K middle-aged
9th Married-spouse-absent Not-in-family Black Female ≤50K middle-aged
HS-grad Married-civ-spouse Husband White Male >50K old
Masters Never-married Not-in-family White Female >50K middle-aged

The data may be found at https://archive.ics.uci.edu/ml/datasets/adult.

21 / 28
The Apriori algorithm on real data

Some rules are obvious in retrospect:

{Husband} ⇒ {Male}
{≤ 50K, Husband} ⇒ {Male}
{Husband, middle-aged} ⇒ {Male, Married-civ-spouse}

Some are more interesting:

{HS-grad} ⇒ {≤ 50K}
{≤ 50K, young} ⇒ {Never-married}
{Husband} ⇒ {Male, Married-civ-spouse, middle-aged}

The meaningfulness of a rule may be measured by confidence, lift and conviction.

22 / 28
A practical matter: writing a Python implementation

23 / 28
Overview of workflow

• Write simple functions first, i.e. the building blocks (e.g. pruning)
• Add doctests and unit tests (e.g. examples from paper)
• Implement a naive, but correct algorithm
• Implement an asymptotically fast algorithm
• Test the preceding two implementations against each other
• Optimize implementation by profiling the code (find bottlenecks)

Understand → Naive algorithm → Asymptotically fast → Further optimizations

24 / 28
Software testing

• Unit tests
– Test a simple function f (xi ) = yi for known cases i = 1, 2, . . .
– Doubles as documentation when writing doctests in Python
• Property tests
– Fix a property, i.e. f (a, b) = f (b, a) for every a, b
– Generate many random inputs a, b to make sure the property holds
• Testing against R, Wikipedia, etc
– Generate some inputs and test against the arules package

25 / 28
Software structure

apriori
Phase 1 Phase 2

itemsets_from_transactions generate_rules_apriori

apriori_gen _ap_genrules

join_step prune_step

Software found at https://github.com/tommyod/Efficient-Apriori.


26 / 28
Summary and references

27 / 28
Summary and references

The Apriori algorithm discovers frequent itemsets in phase 1, and meaningful


association rules in phase 2. Both phases employ clever bottom-up algorithms.
By application of the downward closure property of itemsets (support) and rules
(confidence), candidates may be pruned prior to expensive computations.

• The Python implementation


– github.com/tommyod/Efficient-Apriori
• The original paper
– Agrawal et al, Fast Algorithms for Mining Association Rules, 1994
http://www.cse.msu.edu/~cse960/Papers/
MiningAssoc-AgrawalAS-VLDB94.pdf

28 / 28

You might also like