0% found this document useful (0 votes)

33 views39 pages

6 - Association Rules - For Students

The document discusses association rules, particularly in the context of market basket analysis, which is an unsupervised learning method used to discover relationships in large datasets. It covers the Apriori algorithm for generating frequent itemsets and evaluates candidate rules using metrics like support, confidence, lift, and leverage. Applications of association rules include improving merchandising, product placement, and promotional strategies based on customer purchasing behavior.

Uploaded by

minhhoanghon04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views39 pages

6 - Association Rules - For Students

Uploaded by

minhhoanghon04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Lecture 6:

Association Rules
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
2
Overview
• Association rules Market Basket Analysis

• An unsupervised learning method.

• A descriptive, not predictive, method.

• Used to discover interesting hidden

relationships in a large dataset.

• The disclosed relationships are represented

as rules or frequent itemsets.
>
• Commonly used for mining transactions in
database.

3
Overview
• Some questions that association rules
can answer:
• Which products tend to be purchased
together?

• Of those customers who are similar to this

person, what products do they tend to buy? To analyze customer
buying habits by finding
associations and
• Of those customers who have purchased this correlations between the
different items that
product, what other similar products do they customers place in their
tend to view or purchase? shopping baskets.

4
Overview
The general logic behind association rules:

1. A large collection of transactions (depicted as

three stacks of receipts, in which each transaction
consists of one or more items).

2. Association rules go through the items being

purchased to see what items are frequently
bought together and to discover a list of rules that
describe the purchasing behavior.

3. The rules suggest that when cereal is purchased,

90% of the time milk is purchased; when bread is
purchased, 40% of the time milk is purchased
also; when milk is purchased, 23% of the time
cereal is also purchased.

5
Overview
Rules
• Each rule is in the form X → Y
• Means when Item X is observed, Item Y is also observed.

Itemset
• A collection of items or individual entities that contain
some kind of relationship.
• An itemset containing k items is called a k-itemset.
• k-itemset = {item 1, item 2,…,item k}
• Examples:
• A set of retail items purchased together in one transaction.
• A set of hyperlinks clicked on by one user in a single session.

7
Overview
1-itemsets
Apriori Algorithm
• The most fundamental algorithms for generating association rules. 2-itemsets

• One major component of Apriori is support.

• Given an Itemset L, the support of L is the percentage of transactions that contain L.
• If 80% of all transactions contain itemset {bread}, then the support of {bread} is 0.8.
• If 60% of all transactions contain itemset {bread, butter}, then the support of {bread, butter} is 0.6.

Frequent Itemset
• Items that appear together “often enough” (i.e. meets the minimum support criterion).
• If the minimum support is set at 0.7, {bread} is considered a frequent itemset; whereas {bread,
butter} is not considered as a frequent itemset.

8
Overview
Apriori Property
Frequent
• Also called downward closure property. Itemset

• If an item is considered frequent, then any subset of the

frequent itemset must also be frequent.
• If 60% of the transactions contain {bread, jam}, then at least
60% of all the transactions will contain {bread} or {jam}.
• If the support of {bread, jam} is 0.6, the support of {bread}
or {jam} is at least 0.6.

• If itemset {B, C, D} is frequent, then all the subset of this

itemset, shaded in the figure, must also be frequent
itemsets.
9
Overview
Association Rules Rule evaluation Metrics
• An implication expression of the form X • Support (s): No. of transactions that contain
→ Y, where X and Y are non-overlapping both X and Y out of total no. of transactions.
itemsets.
• E.g. A support of 2% means that 2% of all the
• E.g. {Milk, Diaper} → {Beer} transactions under analysis show that {Milk,
Diaper} and {Beer} are purchased together.
• Generating association rules:
• Step 1: Find frequent itemsets whose • Confidence (c): No. of transactions that
occurrences exceed a predefined contain both X and Y out of total no. of
minimum support threshold. transactions that contains X.
• Step 2: Derive association rules from
those frequent itemsets (with the • E.g. A confidence of 60% means that 60% of
constrains of minimum confidence customers who purchased {Milk, Diaper} also
threshold). bought {Beer}

10
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
11
Apriori Algorithm
Creating Frequent Sets
• Apriori employs an iterative approach known as a level-wise search, where
k-itemsets are used to explore (k+1)-itemsets.

• First, the set of frequent 1-itemsets is found by scanning the database to

accumulate the count for each item, and collecting those items that satisfy
minimum support. The resulting set is denoted L1.

• Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3,
and so on, until no more frequent k-itemsets can be found.

• The finding of each Lk requires one full scan of the database.

12
Example 1
L1
Item Count Items (1-itemsets) (5 transactions ; 6 types of items)
Bread 4
Coke 2 L2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3
{Bread,Milk} 3
Diaper 4
Eggs 1 {Bread,Beer} 2 (No need to generate
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
L3
Itemset Count
{Bread,Milk,Diaper} 2 Triplets (3-itemsets)
MinSupp = 3/5=0.6 {Bread,Milk,Beer} 1
{Bread,Diaper,Beer} 2

13
Apriori Algorithm
Creating Frequent Sets

• Let’s define:
Ck as a candidate itemset of size k
Lk as a frequent itemset of size k

• Main steps of iteration are:

1. Find frequent itemset Lk-1 (starting from L1)
2. Join step: Ck is generated by joining Lk-1 with itself (cartesian product Lk-1 x Lk-1)
3. Prune step (Apriori Property): Any (k−1) size itemset that is not frequent cannot be a subset of
a frequent k size itemset, hence should be removed from Ck
4. Frequent set Lk has been achieved

14
Apriori Algorithm
Illustrating Apriori Principle

• Any subset of a
frequent itemset must
also be frequent.

• Itemsets that do not

meet the minimum
support threshold are
pruned away.

15
Apriori Algorithm
Pseudo Code
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;

16
Example 2

17
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
18
Evaluation of Candidate Rules
The process of creating association rules is two-staged.
• First, a set of candidate rule based on frequent itemsets is generated.
• If {Bread, Egg, Milk, Butter} is the frequent itemset, candidate rules will look like:
• {Egg, Milk, Butter} → {Bread}
• {Bread, Milk, Butter} → {Egg}
• {Bread, Egg} → {Milk, Butter}
• Etc.

• Second, the appropriateness of these candidate rules are evaluated using:

• Confidence
• Lift
• Leverage
19
Evaluation of Candidate Rules
Confidence
• The measures of certainty or trustworthiness associated with each discovered rule.
• Mathematically, the percent of transactions that contain both X and Y out of all the
transactions that contain X.

Support( X ∪ Y )
Confidence( X → Y ) =
support( X )

• E.g. if {bread, eggs, milk} has support of 0.15 and {bread, eggs} also has a support of
0.15, the confidence of rule {bread, eggs} → {milk} is 1.
• This means 100% of the time a customer buys bread and eggs, milk is brought as well. The rule is
therefore correct for 100% of the transactions containing bread and eggs.
20
Evaluation of Candidate Rules
{Toothbrush}→{Milk}:
Confidence Confidence = 10/(10+4) = 0.7

• A relationship may be thought of as interesting when the algorithm identifies the

relationship with a measure of confidence greater than or equal to the predefined
threshold (i.e. the minimum confidence).

• Problem with Confidence:

• Given a rule X → Y, confidence considers only the antecedent (X) and the co-
occurrence of X and Y.
• Cannot tell if a rule contains true implication of the relationship or if the rule
is purely coincidental.

21
https://towardsdatascience.com/association-rules-2-aa9a77241654
Evaluation of Candidate Rules
Lift
• Measures how many times more often X and Y occur together than expected if they are
statistically independent of each other.
• A measure of how X and Y are really related rather than coincidentally happening
together.

support( X ∪ Y )
Lift( X ⇒ Y ) =
support( X ) ∗support( Y )

• Lift = 1 if X and Y are statistically independent

• Lift > 1 indicates the degree of usefulness of the rule
• A larger value of lift suggests a greater strength of the association between X and Y.
22
Evaluation of Candidate Rules
Lift

• E.g. Assuming 1000 transactions,

• If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400 of the transactions, then
Lift(milk → eggs) = 0.3 / (0.5 * 0.4) = 1.5
• If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400 of the transactions, then
Lift(milk → bread) = 0.4/(0.5*0.4) = 2.0

• Therefore it can be concluded that milk and bread have a stronger association
than milk and eggs.

23
Evaluation of Candidate Rules
Leverage
• Measure the difference in the probability of X and Y appearing together in the
dataset compared to what would be expected if X and Y were statistically
independent of each other.

Leverage( X → Y ) = Support( X ∪ Y ) - Support( X ) ∗ Support( Y )

• Leverage = 0 if X and Y are statistically independent

• Leverage > 0 indicates the degree of relationship between X and Y,
• A larger leverage value indicates a stronger relationship between X and Y.

24
Evaluation of Candidate Rules
Leverage

• E.g. Assuming 1000 transactions,

• If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400 of the
transactions, then Leverage(milk → eggs) = 0.3 - 0.5*0.4 = 0.1

• If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400 of the
transactions, then Leverage (milk → bread) = 0.4 - 0.5*0.4 = 0.2

• It again confirms that milk and bread have a stronger association

than milk and eggs.
25
Summary
• Assuming 1000 transactions,
• If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400 of the transactions.

• If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400 of the transactions.

{Milk} → {Egg} {Milk} → {Bread}

Confidence 0.3ൗ 0.4ൗ
0.5 = 0.6 0.5 = 0.8
Lift 0.3ൗ 0.4ൗ
0.5 × 0.4 = 1.5 0.5 × 0.4 = 2
Leverage 0.3 − 0.5 × 0.4 = 0.1 0.4 − 0.5 × 0.4 = 0.2
Smaller Greater

milk and bread have a stronger association than milk and eggs.
26
Evaluation of Candidate Rules
• Confidence is able to identify trustworthy rules, but it cannot tell
whether a rule is coincidental.

• Measures such as lift and leverage not only ensure interesting rules
are identified but also filter out the coincidental rules.

• Support, confidence, lift and leverage ensures the discovery of

interesting and strong rules from sample dataset.

27
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
28
Example 3

Five Market Basket

transactions with items
labelled as I1, I2, I3 and so on.

29
Example 4
TID List of Items
• Consider a database, D , consisting of 9 transactions.
T100 I1, I2, I5

T101 I2, I4 • Suppose minimum support count required is 2 (i.e.

min_sup = 2/9 = 22 % )
T102 I2, I3

T103 I1, I2, I4

• Let minimum confidence required be 70%.
T104 I1, I3

T105 I2, I3 • We have first to find out the frequent itemsets using
T106 I1, I3 Apriori algorithm.
T107 I1, I2 ,I3, I5

T108 I1, I2, I3 • Then, Association rules will be generated using min.
support & min. confidence.

30
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
31
Applications of Association Rules
The term market basket analysis refers to a specific implementation of
association rules.
• For better merchandising – products to include/exclude from
inventory each month
• Placement of products
• Cross-selling
• Promotional programs—multiple product purchase incentives
managed through a loyalty card program
32
Applications of Association Rules
• Input: the simple point-of-sale transaction data
• Output: Most frequent affinities among items

• Example: according to the transaction data…

“Customer who bought a laptop computer and a virus protection software, also
bought extended service plan 70 percent of the time."

• How do you use such a pattern/knowledge?

• Put the items next to each other for ease of finding
• Promote the items as a package (do not put one on sale if the other(s) are on sale)
• Place items far apart from each other so that the customer must walk the aisles to search for it,
and by doing so potentially seeing and buying other items

33
Applications of Association Rules
Recommender systems – Amazon, Netflix:
• Clickstream analysis from web usage log files
• Website visitors to page X click on links A,B,C more than on links D,E,F

In medicine:
• relationships between symptoms and illnesses;
• diagnosis and patient characteristics and treatments (to be used in
medical DSS);
• genes and their functions (to be used in genomics projects)..
34
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Application of Association Rules

Examples

Validation and Testing

Diagnostic
35
Validation and Testing
• The frequent and high confidence itemsets are found by pre-specified minimum support
and minimum confidence levels

• Measures like lift and/or leverage then ensure that interesting rules are identified rather
than coincidental ones

• However, some of the remaining rules may be considered subjectively uninteresting

because they don’t yield unexpected profitable actions

• E.g., rules like {paper} → {pencil} are not interesting/meaningful

• Incorporating subjective knowledge requires domain experts

• Good rules provide valuable insights for institutions to improve their business operations

36
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
37
Diagnostics
• Although the Apriori algorithm is easy to understand and implement, some
of the rules generated are uninteresting or practically useless.

• Additionally, some of the rules may be generated due to coincidental

relationships between the variables.

• Measures like confidence, lift, and leverage should be used along with
human insights to address this problem.

38
Diagnostics
• Another problem with association rules is that, in Phase 3 and 4 of the Data
Analytics Lifecycle , the team must specify the minimum support prior to
the model execution, which may lead to too many or too few rules.

• In related research, a variant of the algorithm can use a predefined target

range for the number of rules so that the algorithm can adjust the minimum
support accordingly.

• Algorithm requires a scan of the entire database to obtain the result.

Accordingly, as the database grows, it takes more time to compute in each
run.

39
Diagnostics
Approaches to improve Apriori’s efficiency:

Partitioning:
• Any itemset that is potentially frequent in a transaction database must be frequent in at least one of the partitions of the
transaction database.
Sampling:
• This extracts a subset of the data with a lower support threshold and uses the subset to perform association rule mining.

Transaction reduction:
• A transaction that does not contain frequent k-itemsets is useless in subsequent scans and therefore can be ignored.

Hash-based itemset counting:

• If the corresponding hashing bucket count of a k-itemset is below a certain threshold, the k-itemset cannot be frequent.

Dynamic itemset counting:

• Only add new candidate itemsets when all of their subsets are estimated to be frequent.

Physical Inspection Check List - Wja Sample Format-1
100% (1)
Physical Inspection Check List - Wja Sample Format-1
18 pages
CMM Basis and Troubleshoting
No ratings yet
CMM Basis and Troubleshoting
56 pages
Mastering Python for Finance
From Everand
Mastering Python for Finance
James Ma Weiming
5/5 (1)
Saab Kockums-A26 Brochure A4 Final Aw Screen
100% (2)
Saab Kockums-A26 Brochure A4 Final Aw Screen
16 pages
Illustrated Parts Data For RADIO Trade
No ratings yet
Illustrated Parts Data For RADIO Trade
37 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
Association Rules
No ratings yet
Association Rules
24 pages
Unit - V Part-1
No ratings yet
Unit - V Part-1
43 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
ML Module3
No ratings yet
ML Module3
83 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
W5 - Apriori
No ratings yet
W5 - Apriori
16 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Association Rules
No ratings yet
Association Rules
39 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Slides
No ratings yet
Slides
92 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Unit 4
No ratings yet
Unit 4
97 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
06 CST8390 AssociationRule
No ratings yet
06 CST8390 AssociationRule
21 pages
Unit 2
No ratings yet
Unit 2
14 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Week04 Association Rules and Collaborative Filtering
No ratings yet
Week04 Association Rules and Collaborative Filtering
21 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
BD25
No ratings yet
BD25
19 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Data Mining: Association
No ratings yet
Data Mining: Association
41 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
04 AssociationRules
No ratings yet
04 AssociationRules
15 pages
04 AssociationRules PDF
No ratings yet
04 AssociationRules PDF
15 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Unit - III
No ratings yet
Unit - III
27 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
A Manual for Agribusiness Value Chain Analysis in Developing Countries
From Everand
A Manual for Agribusiness Value Chain Analysis in Developing Countries
Benjamin Dent
No ratings yet
Best Stocks for Day Trading: How to Find the Best Stocks for Your Day Trading Strategy
From Everand
Best Stocks for Day Trading: How to Find the Best Stocks for Your Day Trading Strategy
William Lowe
3.5/5 (3)
Calculus: 1001 Practice Problems For Dummies (+ Free Online Practice)
From Everand
Calculus: 1001 Practice Problems For Dummies (+ Free Online Practice)
Patrick Jones
3/5 (1)
Finite Element Analysis of Cylinder Piston Impact Based On ANSYS/LS-DYNA
No ratings yet
Finite Element Analysis of Cylinder Piston Impact Based On ANSYS/LS-DYNA
4 pages
AICE Milestone04 07.04.2024 PDF
No ratings yet
AICE Milestone04 07.04.2024 PDF
15 pages
Project Status
No ratings yet
Project Status
21 pages
Data Transcriber Resume
100% (2)
Data Transcriber Resume
7 pages
CMC Catalog 2008-2009
No ratings yet
CMC Catalog 2008-2009
7 pages
Revit Template Checklist
No ratings yet
Revit Template Checklist
14 pages
MCIT Annual Report 2022 en Web Compressed
No ratings yet
MCIT Annual Report 2022 en Web Compressed
110 pages
ZALORA Supplier Deck 2019
100% (1)
ZALORA Supplier Deck 2019
21 pages
Half Adder
No ratings yet
Half Adder
16 pages
PMP 30 Days Plan
No ratings yet
PMP 30 Days Plan
4 pages
Dalmore 12 UMA: Compal Confidential
No ratings yet
Dalmore 12 UMA: Compal Confidential
60 pages
VTAMPS 17 K Set 3
No ratings yet
VTAMPS 17 K Set 3
23 pages
BU Guide To Citation and Referencing Harvard Style
No ratings yet
BU Guide To Citation and Referencing Harvard Style
13 pages
ID974 Deep Cooling Cycle Eng
No ratings yet
ID974 Deep Cooling Cycle Eng
6 pages
Hikvision: Ultra-Powerful Network Cameras
No ratings yet
Hikvision: Ultra-Powerful Network Cameras
13 pages
CurveLab Toolbox, Version 2.0
No ratings yet
CurveLab Toolbox, Version 2.0
4 pages
TAREA 14-TABLAS DINÁMICAS-Fabio-Zambrana-Lazcano
No ratings yet
TAREA 14-TABLAS DINÁMICAS-Fabio-Zambrana-Lazcano
41 pages
CCL Lab 2.0
No ratings yet
CCL Lab 2.0
10 pages
Analog To Digital Conversion
No ratings yet
Analog To Digital Conversion
28 pages
2014 Digest of Technical Papers
No ratings yet
2014 Digest of Technical Papers
750 pages
CNC Engineer Interview Questions Answers
No ratings yet
CNC Engineer Interview Questions Answers
5 pages
TVL EMPOWERMENT TECHNOLOGIES-Q4-M1 - JANETH PINEDA
100% (1)
TVL EMPOWERMENT TECHNOLOGIES-Q4-M1 - JANETH PINEDA
8 pages
Counting Sort
No ratings yet
Counting Sort
7 pages
Corrected Thesis JoaoMaria67923
No ratings yet
Corrected Thesis JoaoMaria67923
118 pages
CN Lab Manual Updated
No ratings yet
CN Lab Manual Updated
100 pages
PDF Tourismpromotionservices q1 Module 2 For Teacher - Compress
No ratings yet
PDF Tourismpromotionservices q1 Module 2 For Teacher - Compress
30 pages

6 - Association Rules - For Students

Uploaded by

6 - Association Rules - For Students

Uploaded by

Lecture 6:

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• An unsupervised learning method.

• A descriptive, not predictive, method.

• Used to discover interesting hidden

• The disclosed relationships are represented

• Of those customers who are similar to this

1. A large collection of transactions (depicted as

2. Association rules go through the items being

3. The rules suggest that when cereal is purchased,

• One major component of Apriori is support.

• If an item is considered frequent, then any subset of the

• If itemset {B, C, D} is frequent, then all the subset of this

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• First, the set of frequent 1-itemsets is found by scanning the database to

• The finding of each Lk requires one full scan of the database.

• Main steps of iteration are:

• Itemsets that do not

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• Second, the appropriateness of these candidate rules are evaluated using:

• A relationship may be thought of as interesting when the algorithm identifies the

• Problem with Confidence:

• Lift = 1 if X and Y are statistically independent

• E.g. Assuming 1000 transactions,

Leverage( X → Y ) = Support( X ∪ Y ) - Support( X ) ∗ Support( Y )

• Leverage = 0 if X and Y are statistically independent

• E.g. Assuming 1000 transactions,

• It again confirms that milk and bread have a stronger association

{Milk} → {Egg} {Milk} → {Bread}

• Support, confidence, lift and leverage ensures the discovery of

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

Five Market Basket

T101 I2, I4 • Suppose minimum support count required is 2 (i.e.

T103 I1, I2, I4

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• Example: according to the transaction data…

• How do you use such a pattern/knowledge?

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• However, some of the remaining rules may be considered subjectively uninteresting

• E.g., rules like {paper} → {pencil} are not interesting/meaningful

• Incorporating subjective knowledge requires domain experts

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• Additionally, some of the rules may be generated due to coincidental

• In related research, a variant of the algorithm can use a predefined target

• Algorithm requires a scan of the entire database to obtain the result.

Hash-based itemset counting:

Dynamic itemset counting:

You might also like