06 Association Rules
06 Association Rules
Analysis
Descriptive Modelling:
Association Rule Analysis
Dr Daqing Chen
Outline
• What is association analysis (market basket analysis)?
• Key concepts and terminologies:
– Itemset
– k-itemset
– Support count and support of an itemset
– Frequent itemset (large itemset)
– Support, confidence, and lift of an association rule
• Apriori algorithm:
– How it works
– How to use it to generate frequent itemsets and further generate association rules
• Implementation in Python:
– PyCaret, more powerful
– Apyori, simple
{Diapers} {Beer},
{Milk, Bread} {Eggs, Coke},
{Beer, Bread} {Milk}, ……
Implication
20/04/2023
“ →” means co-occurrence/correlation,
DMA Lecture 06
not causality!
l3
Association Rule Mining: Basic Concepts
• Item: A distinct object or a unique attribute-value pair (Recall:
data matrix for transaction records, Item=Beer, Item=Diapers, …)
• Itemset: A collection of one or more items
• k-itemset: An itemset that contains k items
• Support count of an itemset: The frequency of occurrence of an
itemset in a dataset – Simply COUNT!
• Support: Fraction of transactions that contain a certain itemset
in a dataset
• Frequent itemset: An itemset whose support is not less than a
pre-defined minimum support threshold, also called large
itemset
d 2d
5 32
10 1024
20 1048576
40 1.1E+12 If d = 6, R = 602 rules
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Infrequent
itemset
Pruned
supersets
20/04/2023 DMA Lecture 06 18
Apriori Algorithm: Reducing Number of
Candidate Itemsets
Give d distinct items () and pre-defined minimum
support threshold
Apriori Algorithm for Searching and Generating Frequent itemsets
1: Set
2: Repeat
List all individual candidate k-itemsets.
Count the support for each itemset. Select only the frequent k-itemsets that
satisfy the predefined minimum support threshold. Ignore any infrequent
itemsets.
Use the remaining frequent k-itemsets to generate candidate (k+1)-itemsets
k= k+1
3: Until k= d
We know ≤
20/04/2023 ≤ , so which of the
DMAabove
Lecture 06has the biggest value? 21
Rule Generation
• In other words, if is low, then any rules containing
in its consequent (RHS) will be low, e.g.,
If is low, then any rules containing in its
consequent (RHS) will be low
, A,
, ,
• Important fact: For a given association rule, moving
items from the antecedent to the consequent never
changes support, and never increases confidence
Pruned
Rules
20/04/2023 DMA Lecture 06 23
Discussion
• Data format: binary, nominal
• Attribute-value pairs and transactions: data matrix
• Support count - essential
• Confidence is not necessarily the best measure: Other measures have
been devised: lift - correlation
𝐶𝑜𝑛𝑓 { 𝑋 → 𝑌 } 𝑠𝑢𝑝( 𝑋 ∪𝑌 ) 𝑃𝑟 ( 𝑋 ∪ 𝑌 )
𝐿𝑖𝑓𝑡= = =
𝑠𝑢𝑝 (𝑌 ) 𝑠𝑢𝑝( 𝑋 )𝑠𝑢𝑝(𝑌 ) 𝑃𝑟 ( 𝑋 ) 𝑃𝑟 (𝑌 )
• are independent, and items are randomly purchased together.
• : negatively associated - the occurrence of inhibits the occurrence of .
• : positively associated - the occurrence of prompts the occurrence of ,
and items are purchase together more often than random.
S(bread)=1650/2000=82.5%, S(milk)=1200/2000=60%
C(milk → bread)=900/1200=75%, C(beard → milk) = 900/1650=54%
Negatively associated: buying one item results in a decrease in buying the
other item
Lift=0.45/(0.6*0.825)=0.91 <1 DMA Lecture 06
20/04/2023 25
Discussion
• How to set an appropriate minimum support
threshold?
– If it is set too high, we could miss item sets involving
interesting rare items (e.g., expensive products in transaction
records; unit failures in student records)
– If it is set too low, it is computationally expensive, and the
number of itemsets to create is very large
• Using a single minimum support threshold may not be
effective
• Using the support count or support of each distinct
item as a reference
20/04/2023 DMA Lecture 06 26
Using the Support Count or Support of Each
Distinct Item as a Reference
• What would be an appropriate min support threshold in order to find
out any association rules relating to item C?
• What would be an appropriate min support threshold in order to find
out any association rules relating to item E?