0% found this document useful (0 votes)

65 views32 pages

06 Association Rules

The document discusses association analysis, also known as market basket analysis. It defines key concepts such as itemsets, support count, frequent itemsets, and association rules. The Apriori algorithm is introduced as an efficient way to generate frequent itemsets and association rules by exploiting the anti-monotone property of support, where any subset of a frequent itemset must also be frequent. Implementation of association rule mining in Python is briefly mentioned using libraries like PyCaret and Apyori.

Uploaded by

samia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views32 pages

06 Association Rules

Uploaded by

samia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Data Mining and

Analysis

Descriptive Modelling:
Association Rule Analysis

Dr Daqing Chen
Outline
• What is association analysis (market basket analysis)?
• Key concepts and terminologies:
– Itemset
– k-itemset
– Support count and support of an itemset
– Frequent itemset (large itemset)
– Support, confidence, and lift of an association rule
• Apriori algorithm:
– How it works
– How to use it to generate frequent itemsets and further generate association rules
• Implementation in Python:
– PyCaret, more powerful
– Apyori, simple

20/04/2023 DMA Lecture 06 2

Association Rule Mining: Basic Concepts
• Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items in
the transaction
• Also known as market basket analysis, Affinity analysis, can be used
for cross-selling, planning store layout, recommendations, etc.
• More generally, find associations or links between different
attributes or attribute-value pairs
Market basket transactions

Example of Association Rules

{Diapers}  {Beer},
{Milk, Bread}  {Eggs, Coke},
{Beer, Bread}  {Milk}, ……

Implication
20/04/2023
“ →” means co-occurrence/correlation,
DMA Lecture 06
not causality!
l3
Association Rule Mining: Basic Concepts
• Item: A distinct object or a unique attribute-value pair (Recall:
data matrix for transaction records, Item=Beer, Item=Diapers, …)
• Itemset: A collection of one or more items
• k-itemset: An itemset that contains k items
• Support count of an itemset: The frequency of occurrence of an
itemset in a dataset – Simply COUNT!
• Support: Fraction of transactions that contain a certain itemset
in a dataset
• Frequent itemset: An itemset whose support is not less than a
pre-defined minimum support threshold, also called large
itemset

20/04/2023 DMA Lecture 06 4

Basic Concepts: An Example
• Distinct items: Bread, Milk, Diapers, Beer, Eggs, Coke
• Itemsets:
– 1-itemsets:
{Bread}, {Milk}, {Diapers}, {Beer}, {Eggs}, {Coke} (each distinct item is an 1-itemset)
– 2-itemsets:
{Bread, Milk}, {Bread, Diapers}, {Bread, Beer}, {Bread, Eggs}, {Bread, Coke}, {Milk,
Diaper}, {Milk, Beer}, {Milk, Eggs}, {Milk, Coke}, … (All possible combinations of any
two of the six distinct items)
– 3-itemsets: {Bread, Milk, Diapers} …
• What is the max size of the itemsets that can be extracted from the dataset,
i.e., the max number of items in an itemset?
– 6-itemset {Bread, Milk, Diapers, Beer, Eggs, Coke}
• How many itemsets in total can be created? 2n
2
• Support count ({Bread, Milk, Diapers} ) = ?
2/5=0.4=40%
• Support ({Bread, Milk, Diapers} ) = ?
20/04/2023 DMA Lecture 06 5
Rule Evaluation Metrics
• Association Rule: Let X and Y denote two disjoint
itemsets, , an association rule is an implication
expression of the form
e. g.: , or
, , ....
• Association rule indicates: IF someone buys , THEN
s/he is likely to buy , too.
How to indicate the likelihood?
Which rules are interesting and useful?
How to measure them?
20/04/2023 DMA Lecture 06 6
Rule Evaluation Metrics
Each rule has two basic measures
• Support:
– Defined as the ratio of the total number of transactions that contain
both and to the total number of transactions in a given dataset
– Represents how frequently the itemsets (and ) appear in a given
dataset, i.e., the frequency of the occurring pattern
• Confidence:
– Defined as the ratio of the total number of transactions that contain
both and to the total number of transactions that contain only
– Indicates the strength of implication in the rule, i.e., how often the
rule has been found to be true
– Represents the conditional probability that Y is true when X is known
to be true
20/04/2023 DMA Lecture 06 7
Rule Evaluation Metrics
– S = Support Support count /Total number of transactions

– C Confidence Containing every

item in both X and Y

𝑠𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋 ∪𝑌 ) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑐𝑜𝑢𝑛𝑡( 𝑋 ∪𝑌 )

¿ = =𝑃𝑟 (¿𝑌 / 𝑋 )¿
𝑠𝑢𝑝𝑝𝑜𝑟𝑡( 𝑋 ) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑐𝑜𝑢𝑛𝑡( 𝑋)
A conditional
probability of Y given x

We are only interested in strong rules which satisfy

both minimum support threshold and minimum
confidence threshold
20/04/2023 DMA Lecture 06 8
Association Rules: An Example
Example of association rules from this
transaction dataset:
{Milk, Diapers} → {Beer}: s=2/5=0.4; c=2/3=0.67
{Milk, Beer} → {Diapers}: s=2/5=0.4; c=2/2=1.00

• The Antecedent and Consequent of a rule

– IF ( certain specified patterns occur in the data )
– THEN ( take the appropriate actions)
– The left-hand side of the rule LHS, or the IF part, is known
technically as the antecedent of the rule
– the right-hand side RHS, or the THEN part, is called the
consequent
20/04/2023 DMA Lecture 06 9
Association Rule Mining Task
• Given a set of transactions, the goal of
association rule mining is to find all rules having
– Support ≥ minimum support threshold
– Confidence ≥ minimum confidence threshold
• Two-stage approach:
– Frequent Itemset Generation: Generate all itemsets
whose support  minimum support threshold
– Rule Generation: Generate high confidence rules from
each frequent itemset which have confidence 
minimum confidence threshold

20/04/2023 DMA Lecture 06 10

Approaches to Association Rules Mining
• Brute-force (Naïve) approach
– List all possible item sets
– List all possible association rules
– Calculate the support and confidence for each rule
– Prune rules that fail the minimum support and
minimum confidence thresholds
However, this is computationally prohibitive

20/04/2023 DMA Lecture 06 11

Computational Complexity
• Given d distinct items:
– Total number of itemsets = 2d
– Total number of possible association rules increases
exponentially when d increases
– Think about how many distinct items a retail, like Tesco,
provides, and how many itemsets to be checked?

d 2d

5 32
10 1024
20 1048576
40 1.1E+12 If d = 6, R = 602 rules

20/04/2023 DMA Lecture 06 12

Computational Complexity
• Itemset lattice: A structure showing all the possible
itemsets, lexicographically ordered, that can be
generated from a given number of distinct items
• Do we have to search and check all the itemsets one by
one?

20/04/2023 DMA Lecture 06 13

Generating Frequent Itemsets Efficiently
• Brute-force approach is too expensive and not practical …
How to find all frequent itemsets effectively?
• Apriori approach: popular and effective
We know that finding 1-itemsets is easy, so …
• Idea: only use frequent itemsets to generate a bigger
itemset and ignore any in-frequent itemset
• Start with frequent 1-itemsets to generate 2-itemsets,
and use frequent 2-itemsets and 1-itemsets to generate
3-itemsets, and so on ...
• Is this approach valid?
20/04/2023 DMA Lecture 06 14
Generating Frequent Itemsets Efficiently
• Apriori principle:
– If an itemset is frequent, then all of its subsets are also frequent,
i.e.,
If is a frequent itemset, then and are frequent itemsets as well
– In general, if is a frequent k-itemset, then all (k-1)-item subsets
of are also frequent
• Apriori principle holds due to the following property of the
support measure:
∀ 𝑋 , 𝑌 : 𝑋 ⊆𝑌 ⇒ 𝑠 ( 𝑋)≥ 𝑠 (𝑌 )
– The support of an itemset never exceeds the support of any its
subsets
– This is known as the anti-monotone property of support
20/04/2023 DMA Lecture 06 15
Discussion on Apriori Principle
• Known: ∀ 𝑋 , 𝑌 , 𝐼 𝑓 𝑋 ⊆ 𝑌 , 𝑇h𝑒𝑛 𝑠( 𝑋)≥ 𝑠 (𝑌 )
• Consider the relationship between a subset and
its superset (and vice visa) in terms of support
Scenario Explanation

𝑚𝑖𝑛𝑆 >𝑆 (𝑋 )≥ 𝑆(𝑌 ) If a subset is infrequent, then any of its superset

is infrequent
If a superset is frequent, then any of its subset is
frequent
𝑚𝑖𝑛𝑆 <𝑆 (𝑋 )≥ 𝑆(𝑌 ) If a subset is frequent, then any of its superset
may or may not frequent
If a superset is infrequent, then any of its
subset may or may not frequent
20/04/2023 DMA Lecture 06 16
Illustrating Apriori Principle
null
If itemset CDE is
frequent, then
any subsets of
A B C D E
CDE are frequent

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Frequent
Itemset
20/04/2023 ABCDE DMA Lecture 06 17
Illustrating Apriori Principle
If itemset AB is infrequent,
then any itemset containing
AB are infrequent

Infrequent
itemset

Pruned
supersets
20/04/2023 DMA Lecture 06 18
Apriori Algorithm: Reducing Number of
Candidate Itemsets
Give d distinct items () and pre-defined minimum
support threshold
Apriori Algorithm for Searching and Generating Frequent itemsets
1: Set
2: Repeat
List all individual candidate k-itemsets.
Count the support for each itemset. Select only the frequent k-itemsets that
satisfy the predefined minimum support threshold. Ignore any infrequent
itemsets.
Use the remaining frequent k-itemsets to generate candidate (k+1)-itemsets
k= k+1
3: Until k= d

20/04/2023 DMA Lecture 06 19

Rule Generation
• Give a frequent itemset L, find all non-empty subsets such
that F L – F satisfies the minimum confidence requirement,
i.e., simply split a frequent itemset into two parts, one as and
remaining as , to form different rule sets
– Example: If is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
AB  CD, AC  BD, AD  BC, BC  AD, BD  AC, CD  AB
A BCD, B ACD, C ABD, D ABC,
– Note: All of these rules have the same support
• In general, if L contains k items, then there are candidate
association rules (ignoring and )

20/04/2023 DMA Lecture 06 20

Rule Generation
• How to efficiently generate rules from frequent itemsets?
– In general, confidence does not have anti-monotone property, e.g.,
c(ABCD) can be larger or smaller than
c(ABD)
– But confidence of rules generated from the same frequent itemset
has anti-monotone property with regards to the number of items
on the RHS of the rule
– e.g., L= {A,B,C,D}:
c (ABC  D)  c(AB  CD)  c(A  BCD)
– How to prove this?
,

We know ≤
20/04/2023 ≤ , so which of the
DMAabove
Lecture 06has the biggest value? 21
Rule Generation
• In other words, if is low, then any rules containing
in its consequent (RHS) will be low, e.g.,
If  is low, then any rules containing in its
consequent (RHS) will be low
,  A, 
, , 
• Important fact: For a given association rule, moving
items from the antecedent to the consequent never
changes support, and never increases confidence

20/04/2023 DMA Lecture 06 22

Rule Generation Using Apriori Algorithm
If the confidence for {BCD} → {A} is low, then all the rules
containing item A in its consequent can be disregarded
Lattice of rules

Pruned
Rules
20/04/2023 DMA Lecture 06 23
Discussion
• Data format: binary, nominal
• Attribute-value pairs and transactions: data matrix
• Support count - essential
• Confidence is not necessarily the best measure: Other measures have
been devised: lift - correlation
𝐶𝑜𝑛𝑓 { 𝑋 → 𝑌 } 𝑠𝑢𝑝( 𝑋 ∪𝑌 ) 𝑃𝑟 ( 𝑋 ∪ 𝑌 )
𝐿𝑖𝑓𝑡= = =
𝑠𝑢𝑝 (𝑌 ) 𝑠𝑢𝑝( 𝑋 )𝑠𝑢𝑝(𝑌 ) 𝑃𝑟 ( 𝑋 ) 𝑃𝑟 (𝑌 )
• are independent, and items are randomly purchased together.
• : negatively associated - the occurrence of inhibits the occurrence of .
• : positively associated - the occurrence of prompts the occurrence of ,
and items are purchase together more often than random.

20/04/2023 DMA Lecture 06 24

Discussion
• Suppose a transaction dataset contains milk and bread as frequent
itemsets ( out of 2000 transactions on a given day):
– Set Min Support = 40%
– Set Min confidence = 70%
milk not milk Total
bread 900 750 1650
not bread 300 50 350
total 1200 800 2000

S(bread)=1650/2000=82.5%, S(milk)=1200/2000=60%
C(milk → bread)=900/1200=75%, C(beard → milk) = 900/1650=54%
Negatively associated: buying one item results in a decrease in buying the
other item
Lift=0.45/(0.6*0.825)=0.91 <1 DMA Lecture 06
20/04/2023 25
Discussion
• How to set an appropriate minimum support
threshold?
– If it is set too high, we could miss item sets involving
interesting rare items (e.g., expensive products in transaction
records; unit failures in student records)
– If it is set too low, it is computationally expensive, and the
number of itemsets to create is very large
• Using a single minimum support threshold may not be
effective
• Using the support count or support of each distinct
item as a reference
20/04/2023 DMA Lecture 06 26
Using the Support Count or Support of Each
Distinct Item as a Reference
• What would be an appropriate min support threshold in order to find
out any association rules relating to item C?
• What would be an appropriate min support threshold in order to find
out any association rules relating to item E?

20/04/2023 DMA Lecture 06 27

Implement Apriori in Python
• Pycaret

• Apyori: in both Jupyter notebook and JupyterLab

20/04/2023 DMA Lecture 06 28

Use apriori: An Example

20/04/2023 DMA Lecture 06 29

Use Pycaret: An Example

• Use InvoiceNo along with Description (or

StockCode) for association rule analysis

20/04/2023 DMA Lecture 06 30

Use Pycaret: An Example (Cont’d)
• Visualise results:

20/04/2023 DMA Lecture 06 31

Summary
• Association analysis: the basic concepts
• Frequent item sets, strong rules, support count,
the support and confidence of a rule
• Generating frequent item sets and strong rules:
– Brute-force approach
– Apriori approach
• Other measures: lift
• How to determine a proper threshold

20/04/2023 DMA Lecture 06 32

MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
DWDM FINAL4
No ratings yet
DWDM FINAL4
37 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Data Mining - Module2
No ratings yet
Data Mining - Module2
112 pages
Data Mining and Data Warehousing: Unit - III Association Rules
No ratings yet
Data Mining and Data Warehousing: Unit - III Association Rules
19 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Slides
No ratings yet
Slides
92 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Lect 6
No ratings yet
Lect 6
74 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Unit 4
No ratings yet
Unit 4
72 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Association Rules
No ratings yet
Association Rules
14 pages
DWM Unit-4
No ratings yet
DWM Unit-4
52 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Association Rules
No ratings yet
Association Rules
39 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Association Rules Notes
No ratings yet
Association Rules Notes
30 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Association Rule
No ratings yet
Association Rule
22 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
DM Association
No ratings yet
DM Association
43 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Rule
No ratings yet
Association Rule
27 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
Data Mining Association Analysis
No ratings yet
Data Mining Association Analysis
18 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Unit 2
No ratings yet
Unit 2
14 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Association Rule Mapping - Unit-4
No ratings yet
Association Rule Mapping - Unit-4
11 pages
DWM Unit-4 Sem Ans
No ratings yet
DWM Unit-4 Sem Ans
9 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Data Analytics. Fast Overview.
From Everand
Data Analytics. Fast Overview.
George Letton
2.5/5 (19)
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

06 Association Rules

Uploaded by

06 Association Rules

Uploaded by

Data Mining and

20/04/2023 DMA Lecture 06 2

Example of Association Rules

20/04/2023 DMA Lecture 06 4

– C Confidence Containing every

𝑠𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋 ∪𝑌 ) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑐𝑜𝑢𝑛𝑡( 𝑋 ∪𝑌 )

We are only interested in strong rules which satisfy

• The Antecedent and Consequent of a rule

20/04/2023 DMA Lecture 06 10

20/04/2023 DMA Lecture 06 11

20/04/2023 DMA Lecture 06 12

20/04/2023 DMA Lecture 06 13

𝑚𝑖𝑛𝑆 >𝑆 (𝑋 )≥ 𝑆(𝑌 ) If a subset is infrequent, then any of its superset

ABCD ABCE ABDE ACDE BCDE

20/04/2023 DMA Lecture 06 19

20/04/2023 DMA Lecture 06 20

20/04/2023 DMA Lecture 06 22

20/04/2023 DMA Lecture 06 24

20/04/2023 DMA Lecture 06 27

• Apyori: in both Jupyter notebook and JupyterLab

20/04/2023 DMA Lecture 06 28

20/04/2023 DMA Lecture 06 29

• Use InvoiceNo along with Description (or

20/04/2023 DMA Lecture 06 30

20/04/2023 DMA Lecture 06 31

20/04/2023 DMA Lecture 06 32

You might also like