Class 4-Associative Analysis
Class 4-Associative Analysis
● 34% in Supermarkets
● 25% in Shopping malls
● 19% in online e-commerce
Why impulse purchases ?
● Impulse buy because:
○ Emotional reason (retail therapy)
○ Lack of economic education
○ Belief it is a deal
○ Shopping causes a dopamine effect
What encourages someone to buy something?
RECOMMENDATION
ASSOCIATION
Two important questions for retail companies
● How to Discover the interests of people who browse the
internet, social networks, shopping malls and
supermarkets ?
● How to associate personal interests with products
and/or services in order to encourage consuption ?
{Bread} → {Milk}
● Confidence (c):
○ Measures how frequently the Consider this association rule:
itens of Y appear in the 𝑀𝑖𝑙𝑘, 𝐷𝑖𝑎𝑝𝑒𝑟𝑠 → 𝐵𝑒𝑒𝑟
transactions that contain X:
σ(𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠,𝐵𝑒𝑒𝑟)
𝑠= = 2/5 = 0,4
𝑁
σ(𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠,𝐵𝑒𝑒𝑟)
𝑐= = 2/3 = 0,67
σ(𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠)
How to interpret support and confidence
● Support (S):
○ A rule with low support means that it can happen by chance, since items will
rarely occur together. E.g. {Diapers, Beer}→Coke, s=1/5
● Confidence (c):
○ Measures the reliability of the inference made by a rule.
○ A high confidence means that Y is more likely to be present in X transactions
○ Rules with high confidence but very low support are generally not of much
interest. E.g. {eggs}→{beer}
Mining association rules
● Given a set of T transactions, the task of mining association rules consists
in finding all rules that satisfy these requirements:
○ support ≥ minsup
○ confidence ≥ minconf
● Brute-force approach:
○ List all possible association rules
○ Compute support and confidence for all rules
○ Delete rules that do not satisfy minsup and minconf thresholds
○ Such an approach is computationally prohibitive for very large itemsets
A two-steps approach for mining association rules
1. Generation of frequent itemsets:
○ All itemsets that satisfy support ≥ minsup
2. Generation of rules:
○ For each itenset, generate rules with high confidence
○ Each rule is a binary partition of this set
● All the above rules are binary partitions of the same itemset:
{Milk,Diapers,Beer}
● Rules originating from the same itemset have identical support, but different
confidence
● Therefore, If the set {Milk,Diapers,Beer} is infrequent, all 6 rules generated from
it will also be infrequent.
● Thus, the support and confidence requirements can be considered separately
1: Generation of frequent itemsets
● Given d items, there are
2𝑑 possible itemsets
● Example: d=5 → 32
itemsets
● Example: d=100, 2100 =
1.267.650.600.228.229.4
01.496.703.205.376
itemsets
X,
Y :
(X
Y
) s
(X
) s
(
Y)
1: Illustration of the
Apriori principle
Support-based
prunning
Prunned supersets
1: Example of itemset generation using Apriori Tid purchases
minsup=0,60
1 Bread, Milk
Minimum count = 3
2 Bread, Diapers, Beer, Eggs
3 Milk, Diapers, Beer, Coke
4 Bread, Milk, Diapers, Beer
6
(1) Item Count
4 (42) Itemset Count
5 Bread, Milk, Diapers, Coke
Bread
Coke
Milk
2
4
{Bread, Milk}
{Bread,Beer}
3
2
( 43 ) Itemset Count
{Bread,Milk,Diapers} 2
Beer 3 {Bread,Diapers} 3
{Brea,Milk,Beer} 1
Diapers 4 {Milk,Beer} 2
Eggs 1 {Milk,Diapers,Beer} 2
{Milk,Diapers} 3
{Beer,Diapers} 3 {Bread,Diapers,Milk} 2
Step 1: Generation of
frequent itemsets (that
contain lightbulbs)
Case study #1
● Association rules
that must include
batteries in the
antecedent
Case study #2
● Titanic dataset
● Apriori with minsup=10%
and minconf=90%
● Frequent itemsets:
Case study #2
● Titanic dataset
● Association rules found:
How to evaluate association rules ?
● Association Rule mining algorithms tend to produce a large number of
rules for different values of Support and Confidence
● However, many of these rules are redundant or uninteresting
○ A rule is redundant if {A,B,C}→{D} and {A,B}→{D}, and both rules have the same support
and confidence
○ (Quantitatively) uninteresting patterns:
■ Involving a set of mutually independent items (the occurrence of one event does
not affect the probability of the other)
■ Covering very few transactions
○ (Qualitatively) uninteresting patterns:
■ That are obvious or expected (for most people), e.g., {bread→milk}
● Measures of interest can be used for pruning or ranking the association
rules obtained by an algorithm
How to evaluate association rules ?
● Objective measures:
○ The association rules are ranked based on statistical methods computed over the data
○ An association metric is used, such as: Support, Confidence, Laplace, Gini, Mutual
Information, Jaccard index, etc
● Subjective measures:
○ The association rules are ranked according to the user interpretation.
○ E.g. a rule is subjectively interested when it contradicts the user’s expectation
● Lift: {Milk,Diapers}→Beer
○ Confidence divided by the propotion of instances covered
𝜎 𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠,𝐵𝑒𝑒𝑟 2
by the consequent 𝑆= = = 0,4
𝑁 5
○ Measures the importance of the association that is 𝜎 𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠,𝐵𝑒𝑒𝑟 2
𝐶= = = 0,67
independent of support 𝜎 𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠 3
𝐶 0,4
𝑙𝑖𝑓𝑡 = = = 0,67
𝑆 𝐵𝑒𝑒𝑟 0,6
Objective metrics for the evaluation of
association rules
● Leverage (L):
○ Proportion of additional instances covered by both, antecedente and
consequent, above of what would be expected if both were statistically
independent
● Conviction (conviction):
○ Measures the independency between antecedente and consequent
𝑃 𝑋 . 𝑃(! 𝑌)
𝐶𝑜𝑛𝑣𝑖𝑐𝑡𝑖𝑜𝑛 =
𝑃 𝑋, ! 𝑌
Objective metrics for the
evaluation of association rules
● There are many
metrics proposed
in the literature
● Some of them may
be good for certain
applications, but
not for others
● There is no clear
definition of their
usefulness...
On transforming nominal and categorical attributes
● Many datasets can contain atributes of diferente types in a list of items
● It is necessary to convert them to a suitable format to be explored by
Associative Analysis methods
On transforming nominal and categorical attributes
● Sex: Simmetric binary atribute → two binary atributes (M, F)
● Education: categorical atribute → three binary atributes (PG, S, 2G)
● Problem: if the values of the nominal atributes are infrequent (e.g. State), it may
not gerate frequent items → Grouping
On transforming continuous attributes
● Discretization methods:
○ Equal intervals, same frequenc of data, etc
○ Problems: discretization can generate a large number of atributes
○ Adjusting the optimal ranges is computionally expensive
Case study #3
● Breast cancer Wisconsin dataset
● Objective: differential diagnosis of breast cancer using characteristics of the
cell nuclei presente in the images
Case study #3
● Breast cancer Wisconsin dataset
● Question: What are the antecedentes that lead to malignancy ?
Associative analysis – advanced topics
● Association rules for infrequent/negatively correlated patterns
● Association rules for sequential (temporal) patterns
● Association rules for graphs