0% found this document useful (0 votes)

45 views24 pages

Association Rule Mining

Specific to ML technique Association rules

Uploaded by

Adithya T Venkataramani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views24 pages

Association Rule Mining

Specific to ML technique Association rules

Uploaded by

Adithya T Venkataramani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Data Mining

Association Rule Mining

Adapted from
Data Mining Concepts and Techniques by
Han, Kamber & Pei

1
Outline

Basic Concepts

Frequent Itemset Mining using Apriori Algorithm

Which Patterns Are Interesting?

Demonstration of frequent item-set mining and

generating association rules using Python

2
What Is Association Rule Mining

Imagine that you are sales manager at All

Electronics store and you are talking to the
customer who recently bought a PC and digital
camera from the store
What should you recommend to her next?
Information about which products are frequently
purchased together by your customer may be
very useful for recommendation
Frequent patterns and association rules are the
knowledge that you want to mine in such a
scenario
3
What is Association Rule Mining

Association rule mining is machine learning

technique used to find association/relation
between items/products/variables from large
transactional data sets
This helps us to understand customers buying
habits
To find association between items we first need
to find frequent itemsets from transactional data

4
What Is Frequent Pattern Analysis?
Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that appear frequently in a data set
e.g. A set of items, such as milk and bread , that appear frequently
together in a transaction data set is a frequent itemset
First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
Frequent itemset mining (finding frequent patterns) lead to the discovery
of associations and correlations among items in large transactional data
sets
A typical example of frequent itemset mining is market basket analysis.
It is process of analyzing customer buying habits by finding association
between the different items that customer place in their shopping basket
Frequent patterns are presented in the form of association rules 5
Market Basket Analysis

6
Applications
The discovery of interesting correlations among huge
amount of transaction data helps in business decision
making processes such as catalog design, cross-
marketing, customer shopping behavior
Products that are frequently purchased together can be
bundled together and discount can be offered to increase
the sale
Design store layout
Strategy 1: Items that are purchased together can be
placed in proximity
Strategy 2: At opposite ends – customers who
purchase such items to pick up other items along the
way
7
Basic Concepts: Frequent Patterns
Itemset is a set of items, and itemset
Tid Items bought
that contains k items is called as k-
10 Beer, Nuts, Diaper itemset
20 Beer, Coffee, Diaper k-itemset X = {x1, …, xk}
30 Beer, Diaper, Eggs absolute support, or, support count of
40 Nuts, Eggs, Milk X: Frequency of occurrence of an
itemset X
50 Nuts, Coffee, Diaper, Eggs, Milk
{Beer,Diaper} support count is 3
Customer Customer relative support or support, s, is the
buys both buys diaper fraction of transactions that contains X
(i.e., the probability that a transaction
contains X)
{Beer,Diaper} support is 3/5
An itemset X is frequent if X’s support
Customer
is not less than a minsup threshold
buys beer

8
Basic Concepts: Association Rules
Frequent patterns are represented in the form of rules
Support and confidence are the two measures of rule interestingness.
association rules are represented as follows:
X Y Support , Confidence
support, s, probability that a transaction contains X ∪ Y
confidence, c, conditional probability that a transaction having X also
contains Y
e.g. Diaper Beer [support =60%, confidence=75%]
Support is percentage of the transactions that contains both X and Y
(Diaper and Beer) . e.g. A support value 60% means that 60% of all
the transactions under analysis show that beer and diaper are
purchased together
Confidence is the percentage of transactions containing X that also
contain Y. e.g. A confidence value 75% means that 75% of the
customers who purchased diaper also bought the beer
9
Basic Concepts: Association Rules
Tid Items bought
Example :
10 Beer, Nuts, Diaper
Association rules
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
Beer Diaper (60%, 100%)
40 Nuts, Eggs, Milk Diaper Beer (60%, 75%)
50 Nuts, Coffee, Diaper, Eggs, Milk

Support (Beer->Diaper)=No. of Support signifies how popular an itemset is

transaction containing both
and
{Beer, Diaper} / Total No. of
transactions = 3/5 = 60%=0.60 Confidence signifies the likelihood of item Y
being purchased when item X is
confidence(Beer->Diaper)=No. purchased
of transaction containing both
{Beer, Diaper} / No. of
transactions containing Beer=
3/3 =100%
10
Basic Concepts: Association Rules

Association rule mining can be viewed as a two

step process :
Find all frequent itemsets (k-itemset that are
frequently purchase together)
Generate strong rules from frequent itemsets

Apriori is seminal algorithm proposed by Agrawal

and Srikant in 1994 for mining frequent itemsets

11
Apriori: A Candidate Generation & Test Approach

Apriori pruning principle: If there is any itemset which is

infrequent, its superset should not be generated/tested!
Apriori algorithm employs an iterative approach where k
itemsets are used to generate k+1 itemset
Method:
Initially, scan DB once to get frequent 1-itemset
Generate frequent 2-itemset
Generate frequent 3 itemset
…
Terminate when no frequent candidate set can be
generated 12
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2
2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
13
Apriori Algorithm

C3=L2 join L2
={{I1,I2,I3}{I1,I2,I5}
{I1,I3,I5},{I2,I3,I4}
{I2,I3,I5}{I2,I4,I5}}

Subsets of {I1,I3,I5}=
{I1, I3, I5,{I1,I3},{I1,I5},
{I3,I5}

Itemset is called as frequent itemset if it satisfies minimum support threshold

condition and all its non empty subsets are also frequent
The Apriori Algorithm (Pseudo-Code)
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=∅; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return ∪k Lk;
15
Implementation of Apriori

How to generate candidates?

Step 1: self-joining Lk
Step 2: pruning
Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4 = {abcd}
16
Generating Association rules

For each frequent itemset l, generate all

nonempty subsets of l

For every nonempty subset s of l, generate

rule : s => (l-s) if
support_count (l) / support_count (s) >=
min-conf threshold

As rules are generated from frequent

itemsets each one automatically satisfies the
minimum support threshold
Generating Association rules example

Consider frequent itemset X= {I1, I2, I5}

Step 1: non empty subsets of X are: {I1, I2} ,

{I1, I5}, {I2, I5}, {I1}, {I2}, {I5}
Step 2: Generate Rules
Misleading rules

Support=4000/10000=40%

Confidence=4000/6000=66%
Misleading rules cont …

The rule is misleading because probability of

purchasing video is 75% which is greater than
confidence of the rule (66%)
Computer games and video are negatively
correlated because purchase of one of these
items decreases the likelihood of purchasing the
other.
Use another measure lift
lift

Lift is measure of correlation

Lift cont…
It assesses the degree to which occurrence of one
(e.g. A) lifts the occurrence of the other (e. g. B)
if lift(A->B)=1 then occurrence of A is independent of
occurrence of B. No association between items.
if lift(A->B)<1 then occurrence of A is negatively
correlated with occurrence of B i.e. occurrence of A
decreases chances of occurrence of B by the factor of
lift(A->B).
if lift(A->B)>1 then occurrence of A is positively
correlated with occurrence of B i.e. occurrence of A
increases chances of occurrence of B by the factor of
lift(A->B)
Leverage:
leverage(X→Y)=support(X→Y)−support(X)*support(Y)
Leverage measures the difference between observed and
expected joint probability of X and Y
Leverage value “0” indicates that occurrence of X and Y is
independent of each other
Conviction:
conviction(X -> Y) =(1-supp(Y))/(1-conf(X -> Y)) = P(X)P(not
Y)/P(X and not Y)
Conviction compares the probability that X appears without Y if
they were dependent, with the actual frequency of the
appearance of X without Y.
Leverage value “1” indicates that occurrence of X and Y is
independent of each other

23
In most cases, it is sufficient to focus on a
combination of support, confidence, and either lift
or leverage to quantitatively measure the
"quality" of the rule. However, the real value of a
rule, in terms of usefulness and action ability is
subjective and depends heavily of the particular
domain and business objectives.

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Introduction to Applied Econometrics Analysis Using Stata
From Everand
Introduction to Applied Econometrics Analysis Using Stata
Justin Doran
5/5 (3)
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Unit 4
No ratings yet
Unit 4
97 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Contents
No ratings yet
Contents
59 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Association Rules
No ratings yet
Association Rules
39 pages
Unit 2
No ratings yet
Unit 2
14 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Unit - III
No ratings yet
Unit - III
27 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
CH - 5
No ratings yet
CH - 5
43 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
20 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Data Mining
No ratings yet
Data Mining
4 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
ML Module3
No ratings yet
ML Module3
83 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
No ratings yet
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
12 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
Data Mining Unit 4 (1) PDF PDF
No ratings yet
Data Mining Unit 4 (1) PDF PDF
11 pages
667a8d24bb947 PPT
No ratings yet
667a8d24bb947 PPT
24 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Visual Financial Accounting for You: Greatly Modified Chess Positions as Financial and Accounting Concepts
From Everand
Visual Financial Accounting for You: Greatly Modified Chess Positions as Financial and Accounting Concepts
Anthony Brticevic
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
File Handling in Java
No ratings yet
File Handling in Java
8 pages
Achyuta Sushma - Resume
No ratings yet
Achyuta Sushma - Resume
6 pages
B-Trees Slides
No ratings yet
B-Trees Slides
24 pages
RELATIONAL DATABASES Unit 1
No ratings yet
RELATIONAL DATABASES Unit 1
61 pages
Coursera Ibm Data
No ratings yet
Coursera Ibm Data
1 page
Lab PETA
No ratings yet
Lab PETA
2 pages
Analisis Big Data en El Mundo Corporativo
No ratings yet
Analisis Big Data en El Mundo Corporativo
8 pages
II Year IV Sem DBMS Question Bank Unit 1 Unit 2
No ratings yet
II Year IV Sem DBMS Question Bank Unit 1 Unit 2
2 pages
Yuvan Karthick 22bai061
No ratings yet
Yuvan Karthick 22bai061
12 pages
Networker 9.X Error Message Guide
No ratings yet
Networker 9.X Error Message Guide
412 pages
Predictive Data Analytics With Python
100% (1)
Predictive Data Analytics With Python
97 pages
Lab 1. Installation of Sybase
No ratings yet
Lab 1. Installation of Sybase
4 pages
SQL Cheat Sheet: Fundamentals SQL Intermediate:: Performing Calculations With SQL Joins & Complex Queries
No ratings yet
SQL Cheat Sheet: Fundamentals SQL Intermediate:: Performing Calculations With SQL Joins & Complex Queries
2 pages
Introduction To SQL: Stone Apple Solutions Pte LTD
100% (1)
Introduction To SQL: Stone Apple Solutions Pte LTD
169 pages
Data Flow Diagram Level 1 Data Flow Diagram For Manny Flower's Online Website System Level 0
No ratings yet
Data Flow Diagram Level 1 Data Flow Diagram For Manny Flower's Online Website System Level 0
3 pages
Perbandingan Metode Ahp Dengan Saw
No ratings yet
Perbandingan Metode Ahp Dengan Saw
4 pages
MS 20761C: Querying Data With Transact-SQL
No ratings yet
MS 20761C: Querying Data With Transact-SQL
7 pages
AOIME Partition Assistant User Manual - 2020 AOMEI
No ratings yet
AOIME Partition Assistant User Manual - 2020 AOMEI
68 pages
Data Transformation (Computing)
No ratings yet
Data Transformation (Computing)
5 pages
Prakash Data Engineer
No ratings yet
Prakash Data Engineer
1 page
Collection Variable Types
No ratings yet
Collection Variable Types
45 pages
The SQL Tutorial For Data Analysis v2
No ratings yet
The SQL Tutorial For Data Analysis v2
103 pages
Abrar Mohiuddin DBA 06102024
No ratings yet
Abrar Mohiuddin DBA 06102024
5 pages
A Close Look at OLTP, OLAP, and RDBMS Dynamics
No ratings yet
A Close Look at OLTP, OLAP, and RDBMS Dynamics
26 pages
Master in SQL: Data Cleaning
No ratings yet
Master in SQL: Data Cleaning
14 pages
Oracle Managed Errors
No ratings yet
Oracle Managed Errors
44 pages
Lpi - Ucertify.010 160.exam - Prep.2023 May 11.by - Bill.54q.vce
No ratings yet
Lpi - Ucertify.010 160.exam - Prep.2023 May 11.by - Bill.54q.vce
7 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
DBMS Handwritten Notes
No ratings yet
DBMS Handwritten Notes
87 pages
SQL Injection
No ratings yet
SQL Injection
5 pages

Association Rule Mining

Uploaded by

Association Rule Mining

Uploaded by

Data Mining

Association Rule Mining

Frequent Itemset Mining using Apriori Algorithm

Which Patterns Are Interesting?

Demonstration of frequent item-set mining and

generating association rules using Python

Imagine that you are sales manager at All

Association rule mining is machine learning

Support (Beer->Diaper)=No. of Support signifies how popular an itemset is

Association rule mining can be viewed as a two

Apriori is seminal algorithm proposed by Agrawal

Apriori pruning principle: If there is any itemset which is

C3 Itemset L3 Itemset sup

Itemset is called as frequent itemset if it satisfies minimum support threshold

How to generate candidates?

For each frequent itemset l, generate all

For every nonempty subset s of l, generate

As rules are generated from frequent

Consider frequent itemset X= {I1, I2, I5}

Step 1: non empty subsets of X are: {I1, I2} ,

The rule is misleading because probability of

Lift is measure of correlation

You might also like