0% found this document useful (0 votes)
36 views27 pages

Unit - III

The document discusses association rule mining algorithms. It introduces the Apriori algorithm, which works in multiple steps to find frequent itemsets and generate association rules from transactional data. The steps of the Apriori algorithm and an example of its application are described. Advantages and disadvantages of the Apriori algorithm are also provided.

Uploaded by

Dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views27 pages

Unit - III

The document discusses association rule mining algorithms. It introduces the Apriori algorithm, which works in multiple steps to find frequent itemsets and generate association rules from transactional data. The steps of the Apriori algorithm and an example of its application are described. Advantages and disadvantages of the Apriori algorithm are also provided.

Uploaded by

Dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Unit - III

Association Rule Mining


Introduction

 Association rule mining finds interesting associations and relationships among large sets of data items.
 Association rule mining is a technique used to uncover hidden relationships between variables in large
datasets.
 This rule shows how frequently a itemset occurs in a transaction.
 Association rule mining is a technique to identify frequent patterns and associations among a set of items.
 The process of identifying an association between products/items is called association rule mining.
 It is a popular method in data mining and machine learning and has a wide range of applications in various
fields, such as market basket analysis, customer segmentation, and fraud detection.
3
Motivation

 Motivation:
Finding inherent regularities in data
 What products were often purchased together?
 Milk and Bread?

 What are the subsequent purchases after buying a PC?


 Jeans => T-shirt; Laptop or PC->anti-virus

 What kinds of DNA are sensitive to this new drug?


 Is there any co-relation in web documents?
 Can we automatically detect SPAM email?
What Is Frequent Pattern Analysis?
6
Use Cases of Association Rule Mining:

Market Basket Analysis


 One of the most well-known applications of association rule mining is in market basket analysis.
 This involves analyzing the items customers purchase together to understand their purchasing
habits and preferences.
 For example, a retailer might use association rule mining to discover that customers who purchase
milk are also likely to purchase bread.
 We can use this information to optimize product placements and promotions to increase sales.
 Customer Segmentation
 Association rule mining can also be used to segment customers based on their purchasing habits.
 For example, a company might use association rule mining to discover that customers who
purchase certain types of products are more likely to be younger.
 Similarly, they could learn that customers who purchase certain combinations of products are more
likely to be located in specific geographic regions.
 We can use this information to tailor marketing campaigns and personalized recommendations to
specific customer segments.
 Fraud Detection
 We can also use association rule mining to detect fraudulent activity.
 For example, a credit card company might use association rule mining to identify patterns of
fraudulent transactions, such as multiple purchases from the same merchant within a short
period of time.
 We can then use this information can to flag potentially fraudulent activity and take
preventative measures to protect customers.
 Social network analysis
 Various companies use association rule mining to identify patterns in social
media data that can inform the analysis of social networks.
 For example, an analysis of Twitter data might reveal that users who tweet about
a particular topic are also likely to tweet about other related topics, which could
inform the identification of groups or communities within the network.
 Recommendation systems
 Association rule mining can be used to suggest items that a customer might be
interested in based on their past purchases or browsing history.
 For example, a music streaming service might use association rule mining to
recommend new artists or albums to a user based on their listening history.
Association Rule Mining Algorithms

 Apriori algorithm
 The Apriori algorithm is one of the most widely used algorithms for association rule mining.
 It works by first identifying the frequent itemsets in the dataset (itemsets that appear in a
certain number of transactions).
 It then uses these frequent itemsets to generate association rules, which are statements of the
form "if item A is purchased, then item B is also likely to be purchased."
 The Apriori algorithm uses a bottom-up approach, starting with individual items and
gradually building up to more complex itemsets.
Apriori Algorithm

 The apriori algorithm has become one of the most widely used algorithms for frequent itemset
mining and association rule learning.
 It has been applied to a variety of applications, including market basket analysis, recommendation
systems, and fraud detection, and has inspired the development of many other algorithms for
similar tasks.
Steps for Apriori Algorithm

 Below are the steps for the apriori algorithm:


 Step-1: Determine the support of itemsets in the transactional database, and select the minimum
support and confidence.
 Step-2: Take all supports in the transaction with higher support value than the minimum or selected
support value.
 Step-3: Find all the rules of these subsets that have higher confidence value than the threshold or
minimum confidence.
 Step-4: Sort the rules as the decreasing order of lift.
Advantages of Apriori Algorithm
•This is easy to understand algorithm
•The join and prune steps of the algorithm can be easily implemented on large datasets.

Disadvantages of Apriori Algorithm


•The apriori algorithm works slow compared to other algorithms.
•The overall performance can be reduced as it scans the database for multiple times.
•The time complexity and space complexity of the apriori algorithm is O(2 D), which is very high.
Here D represents the horizontal width present in the database.
Apriori Algorithm Working

 Example: Suppose we have the following dataset that has various transactions, and from this dataset, we need to
find the frequent itemsets and generate the association rules using the Apriori algorithm:

minimum support count is 2


minimum confidence is 60%
Solution:

 Step-1: K=1
(I) Create a table containing support count of each item present in dataset –
Called C1(candidate set).
 This table is called the Candidate set or C1.
 (II) compare candidate set item’s support count with minimum support count(here min_support=2, if
support_count of candidate set items is less than min_support then remove those items).
 This gives us itemset L1.
 Step-2: K=2
 Generate candidate set C2 using L1 (this is called join step). Condition of joining L k-
1 and Lk-1 is that it should have (K-2) elements in common.

 Check all subsets of an itemset are frequent or not and if not frequent remove that
itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check for each
itemset)
 Now find support count of these itemsets by searching in dataset.
 (II) compare candidate (C2) support count with minimum support count(here min_support=2 if support_count
of candidate set item is less than min_support then remove those items) this gives us itemset L2.
 Step-3:
 Generate candidate set C3 using L2 (join step). Condition of joining L k-1 and Lk-1 is that it should have
(K-2) elements in common. So here, for L2, first element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5} {I1, I3, i5} {I2, I3, I4} {I2, I4, I5} {I2, I3,
I5}
 Check if all subsets of these itemsets are frequent or not and if not, then remove that itemset.(Here
subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3, I4}, subset {I3, I4} is
not frequent so remove it. Similarly check for every itemset)
 find support count of these remaining itemset by searching in dataset.
 (II) Compare candidate (C3) support count with minimum support count(here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L3.
 Step-4:
 Generate candidate set C4 using L3 (join step). Condition of joining L k-1 and Lk-1 (K=4) is
that, they should have (K-2) elements in common. So here, for L3, first 2 elements (items)
should match.
 Check all subsets of these itemsets are frequent or not (Here itemset formed by joining L3 is
{I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So no itemset in C4
 We stop here because no frequent itemsets are found further
 Thus, we have discovered all the frequent item-sets. Now generation of strong
association rule comes into picture. For that we need to calculate confidence of
each rule.
 Confidence –
A confidence of 60% means that 60% of the customers, who purchased milk and
bread also bought butter.
 Confidence(A->B)=Support_count(A∪B)/Support_count(A)
So here, by taking an example of any frequent itemset, we will show the rule generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as strong association
rules.
26
Association Rule Mining Algorithms

 Apriori: A Candidate Generation-and-Test Approach

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data Format


 FP-Growth algorithm
 The FP-Growth (Frequent Pattern Growth) algorithm is another popular algorithm for association
rule mining.
 It works by constructing a tree-like structure called a FP-tree, which encodes the frequent itemsets
in the dataset.
 The FP-tree is then used to generate association rules in a similar manner to the Apriori algorithm.
 The FP-Growth algorithm is generally faster than the Apriori algorithm, especially for large
datasets.
 ECLAT algorithm
 The ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a
variation of the Apriori algorithm that uses a top-down approach rather than a bottom-up
approach.
 It works by dividing the items into equivalence classes based on their support (the number
of transactions in which they appear).
 The association rules are then generated by combining these equivalence classes in a
lattice-like structure.
 It is a more efficient and scalable version of the Apriori algorithm.

You might also like