Association Rule Mining
Association Rule Mining
Adapted from
Data Mining Concepts and Techniques by
Han, Kamber & Pei
1
Outline
Basic Concepts
2
What Is Association Rule Mining
4
What Is Frequent Pattern Analysis?
Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that appear frequently in a data set
e.g. A set of items, such as milk and bread , that appear frequently
together in a transaction data set is a frequent itemset
First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
Frequent itemset mining (finding frequent patterns) lead to the discovery
of associations and correlations among items in large transactional data
sets
A typical example of frequent itemset mining is market basket analysis.
It is process of analyzing customer buying habits by finding association
between the different items that customer place in their shopping basket
Frequent patterns are presented in the form of association rules 5
Market Basket Analysis
6
Applications
The discovery of interesting correlations among huge
amount of transaction data helps in business decision
making processes such as catalog design, cross-
marketing, customer shopping behavior
Products that are frequently purchased together can be
bundled together and discount can be offered to increase
the sale
Design store layout
Strategy 1: Items that are purchased together can be
placed in proximity
Strategy 2: At opposite ends – customers who
purchase such items to pick up other items along the
way
7
Basic Concepts: Frequent Patterns
Itemset is a set of items, and itemset
Tid Items bought
that contains k items is called as k-
10 Beer, Nuts, Diaper itemset
20 Beer, Coffee, Diaper k-itemset X = {x1, …, xk}
30 Beer, Diaper, Eggs absolute support, or, support count of
40 Nuts, Eggs, Milk X: Frequency of occurrence of an
itemset X
50 Nuts, Coffee, Diaper, Eggs, Milk
{Beer,Diaper} support count is 3
Customer Customer relative support or support, s, is the
buys both buys diaper fraction of transactions that contains X
(i.e., the probability that a transaction
contains X)
{Beer,Diaper} support is 3/5
An itemset X is frequent if X’s support
Customer
is not less than a minsup threshold
buys beer
8
Basic Concepts: Association Rules
Frequent patterns are represented in the form of rules
Support and confidence are the two measures of rule interestingness.
association rules are represented as follows:
X Y Support , Confidence
support, s, probability that a transaction contains X ∪ Y
confidence, c, conditional probability that a transaction having X also
contains Y
e.g. Diaper Beer [support =60%, confidence=75%]
Support is percentage of the transactions that contains both X and Y
(Diaper and Beer) . e.g. A support value 60% means that 60% of all
the transactions under analysis show that beer and diaper are
purchased together
Confidence is the percentage of transactions containing X that also
contain Y. e.g. A confidence value 75% means that 75% of the
customers who purchased diaper also bought the beer
9
Basic Concepts: Association Rules
Tid Items bought
Example :
10 Beer, Nuts, Diaper
Association rules
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
Beer Diaper (60%, 100%)
40 Nuts, Eggs, Milk Diaper Beer (60%, 75%)
50 Nuts, Coffee, Diaper, Eggs, Milk
11
Apriori: A Candidate Generation & Test Approach
C3=L2 join L2
={{I1,I2,I3}{I1,I2,I5}
{I1,I3,I5},{I2,I3,I4}
{I2,I3,I5}{I2,I4,I5}}
Subsets of {I1,I3,I5}=
{I1, I3, I5,{I1,I3},{I1,I5},
{I3,I5}
L1 = {frequent items};
for (k = 1; Lk !=∅; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return ∪k Lk;
15
Implementation of Apriori
Support=4000/10000=40%
Confidence=4000/6000=66%
Misleading rules cont …
23
In most cases, it is sufficient to focus on a
combination of support, confidence, and either lift
or leverage to quantitatively measure the
"quality" of the rule. However, the real value of a
rule, in terms of usefulness and action ability is
subjective and depends heavily of the particular
domain and business objectives.
24