0% found this document useful (0 votes)

640 views41 pages

Chapter 10 Association Rule

The document discusses association rule mining and the Apriori algorithm. It introduces key concepts like support, confidence, frequent itemsets, and the Apriori property. The Apriori algorithm works in two parts - first generating frequent itemsets, then generating association rules from those itemsets. It uses an iterative approach and the Apriori property to prune the search space. The document also discusses applying association rule mining to categorical and numerical data, and compares the Apriori approach to the information-theoretic GRI method. Finally, it notes some cases where association rules may not be the best approach.

Uploaded by

nadia friza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

640 views41 pages

Chapter 10 Association Rule

Uploaded by

nadia friza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 41

Chapter 10

ASSOCIATION RULE
By:
Aris D.(13406054)
Ricky A.(13406058)
Nadia FR. (13406069)
Amirah K.(13406070)
Paramita AW.(13406091)
Bahana W.(13406102)
Introduction
• Affinity Analysis
 Study of attributes or characteristics that “go
together”.

• Market Based Analysis

The method, uncover rules for quantifying the
relationship between two or more attributes.

“If antecedent, then consequent”

Affinity Analysis & Market Basket Analysis
• Example:
Supermarket may find that of the 1000 customers
shopping on a Thursday night, 200 bought
diapers, and of the 200 who bought diapers, 50
bought beer.

The association rule:

If buy diapers, then buy beers”,
with support of 50/1000 = 5%,
and confidence of 50/200=25%
Affinity Analysis & Market Basket
Analysis (2)
Examples business & research:
• Investigating the proportion of subscribers to your
company’s cell phone plan that respond positively to an
offer of a service upgrade
• Examining the proportion of children whose parents
read to them who are themselves good readers
• Predicting degradation in telecommunications networks
• Finding out which items in a supermarket are purchased
together & which are never purchased together
• Determining the proportion of cases in which a new drug
will exhibit dangerous side effects
Affinity Analysis & Market Basket
Analysis (3)
• The number of possible association rules grows
exponentially in the number of attributes.
• If binary attributes (yes/no) then there are
k.[2^(k-1)] possible association rule.
• Example: a convinience store that sells 100
items. Possible association rules = 100.[2^99] ≈
6,4 x (10^31)
• A priori algorithm (pendahuluan) reduce the
search problem to a more manageable size
Notation for Data Representation in
Market Basket Analysis
• Farmer sells I = {asparagus, beans, broccoli,
corn, green peppers, squash, tomatoes}
• A customer puts in a basket, Subset I =
{broccoli, corn}
• Subset doesn’t keep track of how much each
item is purchased, just the name of item.
Transactional Data Format
Tabular Data Format
Support, Confidence, Frequent
Itemsets, & the Apriori Property
• Example:
D : set of transactions represented in Table 10.1
T : a transaction in D represents a set of items
I : set of items
Set of items A : beans, squash
Set of items B : asparagus

THEN …
Association rule takes the form if A, then B (AB),
A and B are PROPER subsets of I, and are mutually
exclusive
Table of Transaction Made
 Support and Confidence
• Support, s, is the proportion of transactions in D
that contain both A and B.
support = P(AB)
= number of transactions containing both A&B
total number of transactions
• Confidence, c, is a measure of the accuracy of the
rule.
confidence = P(B|A)= P(AB)
P(A)
= number of transactions containing both A&B
number of transactions containing A

• Analysts prefer RULES:

High support AND High confidence
 Frequent Itemset
 Definition…
An Itemset is a set of items contained in I, and a k-
itemset containing k items.
e.g: {beans, squash}  2-itemset
 The itemset frequency…
the number of transactions that contain the
particular itemset
 A frequent itemset …
itemset that occurs at least a certain minimum
number of times, having itemset frequency
Example:
Set that = 4, then itemsets that occur more than
FOUR times are said to be frequent.
 The Apriori Property
• Mining Association Rules
It is a two-steps process:
1. Find all frequent itemsets (all itemsets with
frequency   )
2. From the frequent itemsets, generate
association rules satisfying the minimum
support and confidence conditions

• The Apriori property states that if an itemset Z is

not frequent, then adding another item A to
the itemset Z will not make Z more
frequent. This helpful property reduces
significantly the search space for the a priori
algorithm.
How does the Apriori Algorithm Work?

• Part 1: Generating Frequent Itemsets

• Part 2: Generating Association Rules
Generating Frequent Itemsets
• Example:
let  = 4, so that an itemset is frequent if it occurs
four or more times in D.

F1= {asparagus, beans, broccoli, corn, green

peppers, squash, tomatoes}
F2 first, constructs a set Ck of candidate k-itemsets
by joining Fk-1 with itself. Then it prunes Ck using
the a priori property.
Ck for k=2, consists of all the combinations of
vegetables in Table 10.4
F3 not much different than the steps for F2, but
use k number = 3
Table 10.3 (pg.183)
Table 10.4 (pg. 185)
• However, consider s={beans, corn, squash}
the subset {corn, squash} has frequency 3 < 4 =
, so that {corn, squash} is not frequent.
By the priori property, therefore, {beans, corn,
squash} cannot be frequent, is therefore pruned,
and doesn’t appear in F3

So does the s= {beans, squash, tomatoes}, the

frequency of the subsets is < 4
Generating Association Rules
1. Generate all subsets of s.
2. Association Rule R : ss ⇒ (s-ss)
Generate R if fulfills the minimum confidence
requirement.
(s-ss) is set s without ss
Example two antecedent

• All transaction = 14
• Transaction include asparagus and beans = 5
• Transaction include asparagus and Squash = 5
• Transaction include Beans and squash = 6
Ranked by support x Confidence

• Minimum Confidence 80%

Clementine generating Association
Rules
Clementine generating Association
Rules (2)
• Support means occurences of antecedent,
different from what we defined before.
• First columns indicates number of antecedent
occurs.
• To find actual “support” using clementine,
multiply support and confidence.
Extension From Flag Data to General
Categorical Data

- Association rule not only for Flag (Boolean)

data.
- A priori algorithm can be applied to categorical
data.
Example using Clementine
• Recall Normalized adult data set in chapter 6
and 7
Information-Theoretic Approach:
Generalized Rule Induction Method
Why GRI?
• A priori algorithm is not well equipped to handle
numerical attributes, need discretization
• Discretization can lead to loss of information
• GRI can handle both categorical or numerical
variables as inputs, but still requires categorical
variables as output
Generalized Rule Induction Method (2)
J-Measure

 p( y | x) 1  p ( y | x) 
J  p( x)  p( y | x). ln  [1  p( y | x)]. ln 
 p ( y ) 1  p ( y ) 

• p(x) probability of the value of x (antecedent)

• p(y) probability of the value of y (consequent)
• p(y|x) conditional probability of y given that x
has occured
Generalized Rule Induction Method (3)
• J-Measure shows “interestingness”
• In GRI, user specifies how many association
rules would be reported
• If the “interestingness” of new rule > current
minimum J in the rule table, new rule is
inserted, rule with minimum J is eliminated
Application of GRI
p(x) : female, never married
p(x) = 0.1463
Application of GRI (2)
p(y) : work class = private
p(y) = 0.6958
Application of GRI (3)
p(y|x) : work class = private;
given : female, never married

p(y|x) = conditional probabilities = 0.763

Application of GRI
Calculation :
 p( y | x) 1  p( y | x) 
J  p( x)  p( y | x). ln  [1  p ( y | x)]. ln
 p( y) 1  p ( y ) 
 0.763 0.237 
 0.14630.763. ln  (0.237). ln
 0.6958 0.3042 
 0.1463[0.763. ln(1.0966)  (0.237). ln(0.7791)]
 0.001637
When not to use Association Rules
• Association Rules chosen a priori could be used
based on:
▫ Confidence
▫ Confidence Difference
▫ Confidence Ratio

• Association Rules need to be applied with care

because the results are sometimes unreliable.
When not to use Association Rules (2)
Association Rules chosen a priori, based on confidence
• Applying this association rule reduces the
probability of randomly selecting desired data.
• Eventhough the rule is useless, software still
reported it probably because the default ranking
mechanism for priori’s algorithm is
confidence.
• We should never simply believe the computer
output without making the effort to
understand the models and mechanism
underlying the result.
When not to use Association Rules (3)
Association Rules chosen a priori, based on confidence
When not to use Association Rules (4)
Association Rules chosen a priori, based on confidence difference

• A random selection from the database would

have provided more effective results (none
useless report)than applying the association
rule.

• This rule provide the greatest increase in

confidence from the prior to posterior.

• Evaluation measures the absolute difference

between the prior and posterior confidences.
When not to use Association Rules (5)
Association Rules chosen a priori, based on confidence difference
When not to use Association Rules (6)
Association Rules chosen a priori, based on confidence ratio

• Analyst prefer to use the confidence ratio to

evaluate potential rules.

• Confidence difference criterion yielded the very

same rules as did the confidence ratio
criterion.
When not to use Association Rules (7)
Association Rules chosen a priori, based on confidence ratio

• Example:
If Marital_Satus = Divorced, then sex = Female. p(y)=0.3317 dan
p(y|x)=0.60
Do Association Rules Represent
Supervised or Unsupervised Learning?
• Supervised learning:
▫ Variable is prespecified
▫ Algorithm is provided with a rich collection of examples
where possible association between the target vaiable and
the predictor variables may be uncovered
• Unsupervised learning:
▫ No target variable is identified explicitly
▫ Algorithm searches for patterns and structure among all the
variables

• Association Rules generally used for unsupervised learning but can

also be applied for supervised learning for classification task
Local Patterns Versus Global Models
 Model: Global Description or Explanation of a
data set.
 Patterns: Essential local features of Data
 Association rules are well suited to uncovering
local patterns in data
 Applying “if “clause drills down deep into data set,
uncovering a hidden local pattern that might be
relevant
 Finding local patterns is one of the most
important goals in data mining. It can lead to new
profitable initiatives.

COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
Unit 4
No ratings yet
Unit 4
97 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Market Basket Analysis Using Association Rules Unit 5
No ratings yet
Market Basket Analysis Using Association Rules Unit 5
21 pages
Lecture 11 Assiciation Rules II M
No ratings yet
Lecture 11 Assiciation Rules II M
27 pages
Site Selection Criteria
0% (2)
Site Selection Criteria
3 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
ANL303 - Week - 4 - Jan 2023
No ratings yet
ANL303 - Week - 4 - Jan 2023
64 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
CH 5
No ratings yet
CH 5
45 pages
Association Rules
No ratings yet
Association Rules
33 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Association Rules
No ratings yet
Association Rules
24 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Lec 2
No ratings yet
Lec 2
18 pages
Pico Bricks Ebook 15
100% (1)
Pico Bricks Ebook 15
234 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
28 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Lec.5.Intro.D.S. Fall 2023
No ratings yet
Lec.5.Intro.D.S. Fall 2023
18 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
No ratings yet
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
54 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Advanced Mathematical Thinking
100% (2)
Advanced Mathematical Thinking
76 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
04 AssociationRules PDF
No ratings yet
04 AssociationRules PDF
15 pages
Bloom's Revised Taxonomy of Educational Objectives
No ratings yet
Bloom's Revised Taxonomy of Educational Objectives
36 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Great Quotes From Zig Ziglar PDF
100% (4)
Great Quotes From Zig Ziglar PDF
51 pages
Flare System
No ratings yet
Flare System
68 pages
Contents
No ratings yet
Contents
59 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
UT525 526 User Manual
No ratings yet
UT525 526 User Manual
31 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Kuiper
No ratings yet
Kuiper
223 pages
The Kinetics of Enzyme - Catalyzed Reactions
100% (1)
The Kinetics of Enzyme - Catalyzed Reactions
38 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
Laporan Turning Prosman
No ratings yet
Laporan Turning Prosman
25 pages
Association Rule
No ratings yet
Association Rule
27 pages
Thermodynamics I
No ratings yet
Thermodynamics I
34 pages
Chapter 13 - Association Rules: Data Mining For Business Intelligence
No ratings yet
Chapter 13 - Association Rules: Data Mining For Business Intelligence
22 pages
Data Mining
No ratings yet
Data Mining
4 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
General Banking AIBL
No ratings yet
General Banking AIBL
72 pages
Laporan Welding Prosman
No ratings yet
Laporan Welding Prosman
20 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
FMEP Interactive Handbook Gold
0% (1)
FMEP Interactive Handbook Gold
5 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Oceanic Feeling PDF
No ratings yet
Oceanic Feeling PDF
20 pages
Apriori
No ratings yet
Apriori
27 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Association Rules
No ratings yet
Association Rules
24 pages
1 ML Introduction
No ratings yet
1 ML Introduction
36 pages
DPU4E HdwGuide A5
No ratings yet
DPU4E HdwGuide A5
58 pages
Worcester Wave: Installation and Operating Manual
No ratings yet
Worcester Wave: Installation and Operating Manual
16 pages
04 AssociationRules
No ratings yet
04 AssociationRules
15 pages
Online Car Driving School Management System-1
No ratings yet
Online Car Driving School Management System-1
35 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
No ratings yet
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
10 pages
Religions in Your Lips
No ratings yet
Religions in Your Lips
98 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Chapter 5 Gastrointestinal Agents Reviewer PDF
No ratings yet
Chapter 5 Gastrointestinal Agents Reviewer PDF
6 pages
Bulk LPG Layout Requirements-Comparison BTW San & Nfpa 58
No ratings yet
Bulk LPG Layout Requirements-Comparison BTW San & Nfpa 58
25 pages
Human Centred Design For Mental Health Services Workshop Report 250523
No ratings yet
Human Centred Design For Mental Health Services Workshop Report 250523
26 pages
Our Walking Drum
No ratings yet
Our Walking Drum
3 pages
Tugas 2 - Soal Sikom
No ratings yet
Tugas 2 - Soal Sikom
5 pages
Tugas 1 - Simulasi Komputer
No ratings yet
Tugas 1 - Simulasi Komputer
5 pages
Internet Society Pulse Platform Presentation Tutorials
No ratings yet
Internet Society Pulse Platform Presentation Tutorials
16 pages
‎⁨أوراق عمل انجليزي 1 2 1ث ف2 موقع مادتي⁩
No ratings yet
‎⁨أوراق عمل انجليزي 1 2 1ث ف2 موقع مادتي⁩
23 pages
Discussion - Design Concepts For Jib Cranes
No ratings yet
Discussion - Design Concepts For Jib Cranes
2 pages
Induction Proofs, IV: Fallacies and Pitfalls: Example 1
No ratings yet
Induction Proofs, IV: Fallacies and Pitfalls: Example 1
4 pages
Catalogue Khớp Nối Mềm Rắc Co
No ratings yet
Catalogue Khớp Nối Mềm Rắc Co
2 pages
Precalculus: A Self-Teaching Guide
From Everand
Precalculus: A Self-Teaching Guide
Steve Slavin
4.5/5 (5)
W&M Approved Weighing Terminal: For Challenging Weighing and Filling Applications
No ratings yet
W&M Approved Weighing Terminal: For Challenging Weighing and Filling Applications
4 pages

Chapter 10 Association Rule

Uploaded by

Chapter 10 Association Rule

Uploaded by

Chapter 10

• Market Based Analysis

“If antecedent, then consequent”

The association rule:

• Analysts prefer RULES:

• The Apriori property states that if an itemset Z is

• Part 1: Generating Frequent Itemsets

F1= {asparagus, beans, broccoli, corn, green

So does the s= {beans, squash, tomatoes}, the

• Minimum Confidence 80%

- Association rule not only for Flag (Boolean)

• p(x) probability of the value of x (antecedent)

p(y|x) = conditional probabilities = 0.763

• Association Rules need to be applied with care

• A random selection from the database would

• This rule provide the greatest increase in

• Evaluation measures the absolute difference

• Analyst prefer to use the confidence ratio to

• Confidence difference criterion yielded the very

• Association Rules generally used for unsupervised learning but can

You might also like