0% found this document useful (0 votes)

36 views27 pages

Unit - III

The document discusses association rule mining algorithms. It introduces the Apriori algorithm, which works in multiple steps to find frequent itemsets and generate association rules from transactional data. The steps of the Apriori algorithm and an example of its application are described. Advantages and disadvantages of the Apriori algorithm are also provided.

Uploaded by

Dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views27 pages

Unit - III

Uploaded by

Dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Unit - III

Association Rule Mining

Introduction

 Association rule mining finds interesting associations and relationships among large sets of data items.
 Association rule mining is a technique used to uncover hidden relationships between variables in large
datasets.
 This rule shows how frequently a itemset occurs in a transaction.
 Association rule mining is a technique to identify frequent patterns and associations among a set of items.
 The process of identifying an association between products/items is called association rule mining.
 It is a popular method in data mining and machine learning and has a wide range of applications in various
fields, such as market basket analysis, customer segmentation, and fraud detection.
3
Motivation

 Motivation:
Finding inherent regularities in data
 What products were often purchased together?
 Milk and Bread?

 What are the subsequent purchases after buying a PC?

 Jeans => T-shirt; Laptop or PC->anti-virus

 What kinds of DNA are sensitive to this new drug?

 Is there any co-relation in web documents?
 Can we automatically detect SPAM email?
What Is Frequent Pattern Analysis?
6
Use Cases of Association Rule Mining:

Market Basket Analysis

 One of the most well-known applications of association rule mining is in market basket analysis.
 This involves analyzing the items customers purchase together to understand their purchasing
habits and preferences.
 For example, a retailer might use association rule mining to discover that customers who purchase
milk are also likely to purchase bread.
 We can use this information to optimize product placements and promotions to increase sales.
 Customer Segmentation
 Association rule mining can also be used to segment customers based on their purchasing habits.
 For example, a company might use association rule mining to discover that customers who
purchase certain types of products are more likely to be younger.
 Similarly, they could learn that customers who purchase certain combinations of products are more
likely to be located in specific geographic regions.
 We can use this information to tailor marketing campaigns and personalized recommendations to
specific customer segments.
 Fraud Detection
 We can also use association rule mining to detect fraudulent activity.
 For example, a credit card company might use association rule mining to identify patterns of
fraudulent transactions, such as multiple purchases from the same merchant within a short
period of time.
 We can then use this information can to flag potentially fraudulent activity and take
preventative measures to protect customers.
 Social network analysis
 Various companies use association rule mining to identify patterns in social
media data that can inform the analysis of social networks.
 For example, an analysis of Twitter data might reveal that users who tweet about
a particular topic are also likely to tweet about other related topics, which could
inform the identification of groups or communities within the network.
 Recommendation systems
 Association rule mining can be used to suggest items that a customer might be
interested in based on their past purchases or browsing history.
 For example, a music streaming service might use association rule mining to
recommend new artists or albums to a user based on their listening history.
Association Rule Mining Algorithms

 Apriori algorithm
 The Apriori algorithm is one of the most widely used algorithms for association rule mining.
 It works by first identifying the frequent itemsets in the dataset (itemsets that appear in a
certain number of transactions).
 It then uses these frequent itemsets to generate association rules, which are statements of the
form "if item A is purchased, then item B is also likely to be purchased."
 The Apriori algorithm uses a bottom-up approach, starting with individual items and
gradually building up to more complex itemsets.
Apriori Algorithm

 The apriori algorithm has become one of the most widely used algorithms for frequent itemset
mining and association rule learning.
 It has been applied to a variety of applications, including market basket analysis, recommendation
systems, and fraud detection, and has inspired the development of many other algorithms for
similar tasks.
Steps for Apriori Algorithm

 Below are the steps for the apriori algorithm:

 Step-1: Determine the support of itemsets in the transactional database, and select the minimum
support and confidence.
 Step-2: Take all supports in the transaction with higher support value than the minimum or selected
support value.
 Step-3: Find all the rules of these subsets that have higher confidence value than the threshold or
minimum confidence.
 Step-4: Sort the rules as the decreasing order of lift.
Advantages of Apriori Algorithm
•This is easy to understand algorithm
•The join and prune steps of the algorithm can be easily implemented on large datasets.

Disadvantages of Apriori Algorithm

•The apriori algorithm works slow compared to other algorithms.
•The overall performance can be reduced as it scans the database for multiple times.
•The time complexity and space complexity of the apriori algorithm is O(2 D), which is very high.
Here D represents the horizontal width present in the database.
Apriori Algorithm Working

 Example: Suppose we have the following dataset that has various transactions, and from this dataset, we need to
find the frequent itemsets and generate the association rules using the Apriori algorithm:

minimum support count is 2

minimum confidence is 60%
Solution:

 Step-1: K=1
(I) Create a table containing support count of each item present in dataset –
Called C1(candidate set).
 This table is called the Candidate set or C1.
 (II) compare candidate set item’s support count with minimum support count(here min_support=2, if
support_count of candidate set items is less than min_support then remove those items).
 This gives us itemset L1.
 Step-2: K=2
 Generate candidate set C2 using L1 (this is called join step). Condition of joining L k-
1 and Lk-1 is that it should have (K-2) elements in common.

 Check all subsets of an itemset are frequent or not and if not frequent remove that
itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check for each
itemset)
 Now find support count of these itemsets by searching in dataset.
 (II) compare candidate (C2) support count with minimum support count(here min_support=2 if support_count
of candidate set item is less than min_support then remove those items) this gives us itemset L2.
 Step-3:
 Generate candidate set C3 using L2 (join step). Condition of joining L k-1 and Lk-1 is that it should have
(K-2) elements in common. So here, for L2, first element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5} {I1, I3, i5} {I2, I3, I4} {I2, I4, I5} {I2, I3,
I5}
 Check if all subsets of these itemsets are frequent or not and if not, then remove that itemset.(Here
subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3, I4}, subset {I3, I4} is
not frequent so remove it. Similarly check for every itemset)
 find support count of these remaining itemset by searching in dataset.
 (II) Compare candidate (C3) support count with minimum support count(here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L3.
 Step-4:
 Generate candidate set C4 using L3 (join step). Condition of joining L k-1 and Lk-1 (K=4) is
that, they should have (K-2) elements in common. So here, for L3, first 2 elements (items)
should match.
 Check all subsets of these itemsets are frequent or not (Here itemset formed by joining L3 is
{I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So no itemset in C4
 We stop here because no frequent itemsets are found further
 Thus, we have discovered all the frequent item-sets. Now generation of strong
association rule comes into picture. For that we need to calculate confidence of
each rule.
 Confidence –
A confidence of 60% means that 60% of the customers, who purchased milk and
bread also bought butter.
 Confidence(A->B)=Support_count(A∪B)/Support_count(A)
So here, by taking an example of any frequent itemset, we will show the rule generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1Î2]=>[I3] //confidence = sup(I1Î2Î3)/sup(I1Î2) = 2/4*100=50%
[I1Î3]=>[I2] //confidence = sup(I1Î2Î3)/sup(I1Î3) = 2/4*100=50%
[I2Î3]=>[I1] //confidence = sup(I1Î2Î3)/sup(I2Î3) = 2/4*100=50%
[I1]=>[I2Î3] //confidence = sup(I1Î2Î3)/sup(I1) = 2/6*100=33%
[I2]=>[I1Î3] //confidence = sup(I1Î2Î3)/sup(I2) = 2/7*100=28%
[I3]=>[I1Î2] //confidence = sup(I1Î2Î3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as strong association
rules.
26
Association Rule Mining Algorithms

 Apriori: A Candidate Generation-and-Test Approach

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data Format

 FP-Growth algorithm
 The FP-Growth (Frequent Pattern Growth) algorithm is another popular algorithm for association
rule mining.
 It works by constructing a tree-like structure called a FP-tree, which encodes the frequent itemsets
in the dataset.
 The FP-tree is then used to generate association rules in a similar manner to the Apriori algorithm.
 The FP-Growth algorithm is generally faster than the Apriori algorithm, especially for large
datasets.
 ECLAT algorithm
 The ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a
variation of the Apriori algorithm that uses a top-down approach rather than a bottom-up
approach.
 It works by dividing the items into equivalence classes based on their support (the number
of transactions in which they appear).
 The association rules are then generated by combining these equivalence classes in a
lattice-like structure.
 It is a more efficient and scalable version of the Apriori algorithm.

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Rani 2
No ratings yet
Rani 2
98 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
Unit - 3 Mining Frequent Patterns
No ratings yet
Unit - 3 Mining Frequent Patterns
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
667a8d24bb947 PPT
No ratings yet
667a8d24bb947 PPT
24 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
17 pages
CH - 5
No ratings yet
CH - 5
43 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
UNIT III
No ratings yet
UNIT III
13 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
15 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Topic 1, 2, 3
No ratings yet
Topic 1, 2, 3
5 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
Contents
No ratings yet
Contents
59 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
5 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Association Analysis and Frequent Sequential Pattern Mining-Apriori Algorithm
No ratings yet
Association Analysis and Frequent Sequential Pattern Mining-Apriori Algorithm
13 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Association Rules
No ratings yet
Association Rules
24 pages

Unit - III

Uploaded by

Unit - III

Uploaded by

Unit - III

Association Rule Mining

 What are the subsequent purchases after buying a PC?

 What kinds of DNA are sensitive to this new drug?

Market Basket Analysis

 Below are the steps for the apriori algorithm:

Disadvantages of Apriori Algorithm

minimum support count is 2

 Apriori: A Candidate Generation-and-Test Approach

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data Format

You might also like