0% found this document useful (0 votes)

10 views

Module 2

ffffff

Uploaded by

sandeepjose2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Module 2

ffffff

Uploaded by

sandeepjose2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

󾠯

Module 2
Association Rules
Frequent Patterns: The patterns that appear frequently in a Dataset.

Association rule mining is a technique used to identify patterns in large data sets. It
involves finding relationships between variables in the data and using those
relationships to make predictions or decisions. The goal of association rule mining is
to uncover rules that describe the relationships between different items in the data
set.
Association rules are if/then statements that help uncover relationships between
seemingly unrelated data in a relational database or another information repository.
An example of an association rule would be “If a customer buys a dozen eggs, he is
80% likely to also purchase milk.”(Market Basket Analysis)

How does Association Rule Learning work?

Association rule learning works on the concept of If and Else Statements, such as if
A then B.

The If element is called antecedent, and then the statement is called Consequent.
These types of relationships where we can find out some association or relation
between two items are known as single cardinality. It is all about creating rules, and
if the number of items increases, then cardinality also increases accordingly. So, to
measure the associations between thousands of data items, there are several
metrics.

Module 2 1
The metrics are :

Support
Support is the frequency of A or how frequently an item appears in the dataset. It
is defined as the fraction of the transaction T that contains the item-set X. If there
are X datasets, then for transactions T, it can be written as:

Confidence
Confidence indicates how often the rule has been found to be true. Or how often
the items X and Y occur together in the dataset when the occurrence of X is
already given. It is the ratio of the transaction that contains X and Y to the
number of records that contain X. or it measures the likelihood of an associated
item being purchased when the antecedent item is purchased. It is calculated as
the proportion of transactions containing the antecedent item in which the
associated item also appears.

support(x, y)
Confidence =
support(x)

Lift
It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y
are independent of each other. It has three possible values:

Module 2 2
If Lift= 1: The probability of occurrence of antecedent and consequent is
independent of each other.

Lift>1: It determines the degree to which the two itemsets are dependent on
each other.

Lift<1: It tells us that one item is a substitute for other items, which means
one item has a negative effect on another.

Market Basket Analysis

Market basket analysis is a technique used in data mining and retail analytics to
identify relationships and patterns in customer purchasing behavior. It involves
analyzing transactional data, typically from point-of-sale systems, to uncover
associations between products that are frequently purchased together. The goal
is to understand the co-occurrence of items in a customer's shopping basket and
to provide insights that can be used for various purposes, such as product
recommendations, store layout optimization, and targeted marketing strategies.

Techniques Used in Market Basket Analysis

Module 2 3
Terms in analysis

Itemset (I): It refers to a collection of items represented as a set {I1,

I2, ..., Im}.

Database transactions (D): It is a set of transactions, where each

transaction (T) is a nonempty itemset and T is a subset of the itemset
I. Each transaction is associated with a unique identifier called TID.

Association rule: An association rule is an implication of form A B,⇒

where A and B are itemsets. A and B are subsets of I, and both A and
B are non-empty. Additionally, A and B should have no common items
(A ∩ B = φ). Association rules describe relationships between items
in transactions.

Support (s): The support of an association rule A ⇒ B is the

percentage of transactions in the database (D) that contain the union
of sets A and B (A ∪ B). It is also considered as the probability, P(A
∪ B).

Confidence (c): The confidence of an association rule A ⇒ B is the

percentage of transactions in D containing A that also contain B. It is
the conditional probability, P(B|A).

Strong rules: Association rules that satisfy both a minimum support

threshold (min sup) and a minimum confidence threshold (min conf)
are considered strong. These thresholds are set based on the desired
significance level.

Occurrence frequency: The occurrence frequency of an itemset is the

count of transactions in D that contain the itemset. It is also referred
to as the support count or count of the itemset.

Frequent itemsets: Itemsets that have relative support (calculated as

the proportion of transactions) satisfying a minimum support threshold
are considered frequent itemsets. The set of frequent k-itemsets is
denoted by Lk

Pruning: Pruning techniques can be applied to remove uninteresting

or redundant rules, enhancing the quality and interpretability of the
generated rules.

Module 2 4
Evaluation and Selection: The generated rules are evaluated based
on measures such as support, confidence, lift, and other metrics. The
selection of rules is based on the desired quality and significance
criteria.

Interpretation and Application: The discovered association rules

provide insights into patterns and relationships among items in the
transactional data. They can be utilized for various purposes, such as
product recommendations, cross-selling, pricing strategies, and
targeted marketing campaigns.

Frequent Item-sets
frequent itemsets refer to sets of items that frequently occur together in
transactions above a specified minimum support threshold. The support of
an itemset is the proportion of transactions that contain all the items in the
set. By identifying frequent itemsets, retailers can uncover patterns and
associations among items that are commonly purchased together.

For example: if the minimum support threshold is set to 5%, an itemset

containing "bread" and "milk" that appears in 7% of all transactions would
be considered a frequent itemset.

Frequent itemsets are typically discovered using algorithms like the Apriori
algorithm or the FP-Growth algorithm, which efficiently traverses the
transactional dataset to find itemsets that meet the minimum support
criteria.

Closed Item-sets
Closed itemsets are a specific type of frequent itemsets that do not have
any supersets with the same support. In other words, a closed itemset is
an itemset for which there is no other itemset containing the same items

Module 2 5
but with higher support. Closed itemsets capture the essential
associations without redundancy.

For example, if the itemset {A, B, C} has a support of 0.1, and there is no
other itemset that contains {A, B, C} with a support of 0.1 or higher, then
{A, B, C} is a closed itemset.
Closed itemsets are useful because they provide a more concise
representation of frequent itemsets and simplify the interpretation of
association rules.

Association rule mining

Association rule mining involves two main steps. First, all frequent
itemsets are identified by applying a minimum support threshold. Then,
strong association rules are generated from the frequent itemsets by
considering the minimum confidence threshold.

Apriori Algorithm
Finding Frequent Itemset by Confined Candidate
Generation

Consider the following dataset and we will find frequent itemsets

and generate association rules for them.

Module 2 6
minimum support count is 2 minimum confidence is 60%

Step-1: K=1
(I) Create a table containing support count of each item present in
dataset – Called C1(candidate set)

Module 2 7
(II) compare candidate set item’s support count with minimum
support count(here min_support=2 if support_count of candidate
set items is less than min_support then remove those items). This
gives us itemset L1.

Step-2: K=2

Generate candidate set C2 using L1 (this is called join step).

Condition of joining L and L is that it should have (K-2)
elements in common.
k-1
k-1

Check all subsets of an itemset are frequent or not and if not

frequent remove that itemset.(Example subset of{I1, I2} are
{I1}, {I2} they are frequent.Check for each itemset)

Now find support count of these itemsets by searching in

dataset.

Module 2 8
(II) compare candidate (C2) support count with minimum
support count(here min_support=2 if support_count of
candidate set item is less than min_support then remove those
items) this gives us itemset L2.

Module 2 9
Step-3:

Generate candidate set C3 using L2 (join step). Condition

of joining L and L is that it should have (K-2) elements in
common. So here, for L2, first element should match.So
itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1,
I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3, I5}

k-1
k-1

Check if all subsets of these itemsets are frequent or not

and if not, then remove that itemset.(Here subset of {I1, I2,
I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3,
I4}, subset {I3, I4} is not frequent so remove it. Similarly
check for every itemset)

find support count of these remaining itemset by searching

in dataset.

Module 2 10
(II) Compare candidate (C3) support count with minimum
support count(here min_support=2 if support_count of
candidate set item is less than min_support then remove
those items) this gives us itemset L3.

Step-4:

Generate candidate set C4 using L3 (join step).

Condition of joining L and L (K=4) is that, they should
have (K-2) elements in common. So here, for L3, first 2
elements (items) should match.

k-1
k-1

Check all subsets of these itemsets are frequent or not

(Here itemset formed by joining L3 is {I1, I2, I3, I5} so
its subset contains {I1, I3, I5}, which is not frequent).
So no itemset in C4

We stop here because no frequent itemsets are found

further

Generating Association Rules from Frequent item

Sets
Thus, we have discovered all the frequent item-sets. Now
generation of strong association rule comes into picture. For that

Module 2 11
we need to calculate confidence of each rule.
Confidence –

A confidence of 60% means that 60% of the customers, who

purchased milk and bread also bought butter.
Confidence(A->B)=Support_count(A∪B)/Support_count(A)

So here, by taking an example of any frequent itemset, we will

show the rule generation.
Itemset {I1, I2, I3} //from L3

SO rules can be
[I1Î2]=>[I3] //confidence = sup(I1Î2Î3)/sup(I1Î2) = 2/4*100=50%
[I1Î3]=>[I2] //confidence = sup(I1Î2Î3)/sup(I1Î3) = 2/4*100=50%

[I2Î3]=>[I1] //confidence = sup(I1Î2Î3)/sup(I2Î3) = 2/4*100=50%

[I1]=>[I2Î3] //confidence = sup(I1Î2Î3)/sup(I1) = 2/6*100=33%

[I2]=>[I1Î3] //confidence = sup(I1Î2Î3)/sup(I2) = 2/7*100=28%

[I3]=>[I1Î2] //confidence = sup(I1Î2Î3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be
considered as strong association rules.

Improving the Efficiency of Apriori

To improve the efficiency of level-wise generation of frequent
itemsets, an important property is used called Apriori
property which helps by reducing the search space.
Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key
concept of Apriori algorithm is its anti-monotonicity of support
measure. Apriori assumes that

All subsets of a frequent itemset must be

frequent(Apriori property).If an itemset is
infrequent, all its supersets will be infrequent.

Module 2 12
Module 2 13

Ingreso, Jorizh T. 1.1 Bsitgen
No ratings yet
Ingreso, Jorizh T. 1.1 Bsitgen
4 pages
Essays On Philosophical Subjects (Adam Smith) PDF
No ratings yet
Essays On Philosophical Subjects (Adam Smith) PDF
361 pages
CSEC Information Processing Full
100% (1)
CSEC Information Processing Full
3 pages
ICT Final Exam For Grade 8
75% (8)
ICT Final Exam For Grade 8
3 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
DM UNIT II (1)
No ratings yet
DM UNIT II (1)
30 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
Closet - An Efficient Algorithm For Mining Frequent
No ratings yet
Closet - An Efficient Algorithm For Mining Frequent
8 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
DWDM-UNIT-4
No ratings yet
DWDM-UNIT-4
12 pages
Data Mining
No ratings yet
Data Mining
4 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
DWDM Unit 3 PDF
No ratings yet
DWDM Unit 3 PDF
16 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
dwdm FINAL4
No ratings yet
dwdm FINAL4
37 pages
Unit Iii (DWDM)
No ratings yet
Unit Iii (DWDM)
11 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
No ratings yet
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
4 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
Untitled Document
No ratings yet
Untitled Document
59 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
M5 m6 KC
No ratings yet
M5 m6 KC
36 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
DA Unit 4
No ratings yet
DA Unit 4
125 pages
DWDM-UNIT-3
No ratings yet
DWDM-UNIT-3
29 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
DWM-UNIT-4
No ratings yet
DWM-UNIT-4
11 pages
Unit 2
No ratings yet
Unit 2
14 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
Report of 2nd Defence
No ratings yet
Report of 2nd Defence
6 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
UNIT-2 DMA (2)
No ratings yet
UNIT-2 DMA (2)
68 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
association rule mapping -unit-4
No ratings yet
association rule mapping -unit-4
11 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Association Rule Mining
No ratings yet
Association Rule Mining
16 pages
A Survey of Association Rule Mining For Customer Relationship Management
No ratings yet
A Survey of Association Rule Mining For Customer Relationship Management
7 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
dataanalytics unit-4
No ratings yet
dataanalytics unit-4
23 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Apriori Documentation
No ratings yet
Apriori Documentation
31 pages
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
No ratings yet
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
12 pages
Unit 4 - DA - Frequent Itemsets and Associations
No ratings yet
Unit 4 - DA - Frequent Itemsets and Associations
31 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
17 pages
Lect 6
No ratings yet
Lect 6
74 pages
14-Introduction to Apriori level wise algorithm-03-09-2024
No ratings yet
14-Introduction to Apriori level wise algorithm-03-09-2024
32 pages
MBAMarket Basket Analysis Using Frequent Pattern Mining Techniques
No ratings yet
MBAMarket Basket Analysis Using Frequent Pattern Mining Techniques
8 pages
Association Rule
No ratings yet
Association Rule
27 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
No ratings yet
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
14 pages
Dbms and Its Managerial Applications (Code: 307D) Unit-I: (A) Introduction To Database and Dbms Software: Database
No ratings yet
Dbms and Its Managerial Applications (Code: 307D) Unit-I: (A) Introduction To Database and Dbms Software: Database
24 pages
Ehr Chapter 1 Carter
No ratings yet
Ehr Chapter 1 Carter
17 pages
Database Performance Tuning and Query Optimization
No ratings yet
Database Performance Tuning and Query Optimization
33 pages
Lab11 - Index and View
No ratings yet
Lab11 - Index and View
7 pages
Bcs Higher Education Qualifications BCS Level 4 Certificate in IT
No ratings yet
Bcs Higher Education Qualifications BCS Level 4 Certificate in IT
5 pages
7023T-TP1-W2-S3-R1 Database
No ratings yet
7023T-TP1-W2-S3-R1 Database
10 pages
cs619 Final Report Software
100% (3)
cs619 Final Report Software
16 pages
Nunca Es Tarde Si La Bicha Es Buena by Paula Rivers: Table of Content
No ratings yet
Nunca Es Tarde Si La Bicha Es Buena by Paula Rivers: Table of Content
2 pages
Chapter 11 Multiple Choice (Database) : True/False Questions
50% (2)
Chapter 11 Multiple Choice (Database) : True/False Questions
2 pages
HCI-Course-Outline-2024
No ratings yet
HCI-Course-Outline-2024
7 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
Chapter 4 Tutorial
No ratings yet
Chapter 4 Tutorial
11 pages
1.0 Overview of The Current State of Technology
No ratings yet
1.0 Overview of The Current State of Technology
4 pages
Data-Mining For Cyber Security
No ratings yet
Data-Mining For Cyber Security
17 pages
Swot Analysis For Celcom
No ratings yet
Swot Analysis For Celcom
9 pages
Collaborative Filtering Using Data Mining and Analysis Vishal Bhatnagar 2024 Scribd Download
100% (1)
Collaborative Filtering Using Data Mining and Analysis Vishal Bhatnagar 2024 Scribd Download
51 pages
Netwrix Auditor Installation Configuration Guide
No ratings yet
Netwrix Auditor Installation Configuration Guide
316 pages
DOC-20240731-WA0005_240731_185956 (2)
No ratings yet
DOC-20240731-WA0005_240731_185956 (2)
3 pages
125 FINAL PDF Agile UX Research PDF
100% (1)
125 FINAL PDF Agile UX Research PDF
55 pages
IDf97098284-2012 Honda Civic Owners Manual
No ratings yet
IDf97098284-2012 Honda Civic Owners Manual
2 pages
Infoblox Datasheet - DNS Traffic Control PDF
No ratings yet
Infoblox Datasheet - DNS Traffic Control PDF
2 pages
Use Cases Notes PDF
No ratings yet
Use Cases Notes PDF
14 pages
Jurnal Kejuruteraan WOS
No ratings yet
Jurnal Kejuruteraan WOS
6 pages
Lecture #4-1. Normalization
No ratings yet
Lecture #4-1. Normalization
34 pages
Adobe-Analytics-Table of Contents
No ratings yet
Adobe-Analytics-Table of Contents
5 pages