0% found this document useful (0 votes)
4 views20 pages

Association Rule

The document discusses frequent pattern mining, focusing on techniques for discovering associations and correlations in datasets, particularly through Market Basket Analysis (MBA). It outlines the steps involved in MBA, including data gathering, preprocessing, and the generation of association rules, as well as the application of the Apriori algorithm for efficient mining of frequent itemsets. Additionally, it highlights various applications of association rule mining across different industries, emphasizing its significance in decision-making and business strategies.

Uploaded by

William D2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views20 pages

Association Rule

The document discusses frequent pattern mining, focusing on techniques for discovering associations and correlations in datasets, particularly through Market Basket Analysis (MBA). It outlines the steps involved in MBA, including data gathering, preprocessing, and the generation of association rules, as well as the application of the Apriori algorithm for efficient mining of frequent itemsets. Additionally, it highlights various applications of association rule mining across different industries, emphasizing its significance in decision-making and business strategies.

Uploaded by

William D2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Mining Frequent Patterns, Association Rule Mining

Frequent patterns refer to recurring patterns like itemsets,


subsequences, or substructures that occur frequently within a dataset. For
instance, a set of items like milk and bread frequently appearing together in
transaction data constitutes a frequent itemset. Similarly, a subsequence,
such as purchasing a PC, followed by a digital camera, and then a memory
card, is considered a frequent sequential pattern if it occurs often in a
shopping history database. A substructure can take various forms like
subgraphs, subtrees, or sublattices, and when it appears frequently, it is
termed a frequent structured pattern. The discovery of such frequent patterns
plays a crucial role in uncovering associations, correlations, and interesting
relationships in data. Furthermore, it aids in data classification, clustering, and
other data mining tasks. As a result, frequent pattern mining has gained
prominence as a vital data mining task and a focal point in data mining
research.

This module presents an introduction to the notions of frequent patterns,


associations, and correlations, as well as an examination of their efficient
mining techniques.

Learning Objectives
After studying this module, you will be able to:
1. Understand the concept of frequent pattern mining and how it Is
used;
2. Know different mining procedures to discover interesting associations
and correlations;
Lesson 1: Market Basket Analysis

Frequent itemset mining is a valuable technique used to reveal


associations and correlations within extensive transactional or relational
datasets. As data continues to accumulate in various industries, there is a
growing interest in mining these patterns from databases. The identification of
meaningful correlation relationships in vast business transaction records can
significantly impact decision-making processes, such as catalog design, cross-
marketing, and customer shopping behavior analysis.

Market Basket Analysis (MBA) is a data mining technique used to


discover associations and relationships between items that customers
frequently purchase together. It is primarily applied in the retail industry to
gain insights into customer buying behavior and to identify patterns of co-
occurrence among products. Market basket analysis serves as a typical
example of frequent itemset mining. This process involves scrutinizing
customer purchasing habits to identify associations between items frequently
placed together in their "shopping baskets" (as shown in Figure 3.1). The
resulting associations empower retailers to formulate effective marketing
strategies based on insights into items often purchased together by
customers.

The key concept in market basket analysis is the notion of "frequent


itemsets." These are sets of items that appear together in transactions above
a predefined minimum support threshold. The support of an itemset is the
proportion of transactions that contain all the items in that set. The higher the
support, the more frequently the itemset occurs, and the more significant it is
considered.

MODULE 3 | Mining Frequent Patterns, Association and Correlations 37


Figure 3.1 Market Basket Analysis

How does Market Basket Analysis Work?

1. Gather transactional data: Acquire data related to customer


transactions, including details of purchased items, transaction
timestamps, and other pertinent information.
2. Preprocess the data: Cleanse and preprocess the data, eliminating
irrelevant information, handling missing values, and transforming it
into a suitable format suitable for analysis.
3. Discover frequent item sets: Employ association rules mining
algorithms like Apriori or FP-Growth to detect frequent item sets,
which are sets of items frequently occurring together in transactions.
4. Compute support and confidence measures: Calculate the support
and confidence metrics for each frequent item set, representing the
probability of an item being bought together with another item.

MODULE 3 | Mining Frequent Patterns, Association and Correlations 38


5. Generate association rules: Derive association rules based on the
frequent item sets and their corresponding support and confidence
values. These rules indicate the likelihood of purchasing one item
given the purchase of another.
6. Interpret the outcomes: Interpret the market basket analysis results
to identify frequently co-purchased items, the strength of
associations between items, and any valuable insights into customer
behavior and preferences.
7. Implement actions: Utilize the insights from the market basket
analysis to drive business decisions, such as personalized product
recommendations, optimizing store layout, and devising targeted
marketing campaigns.

Three types of Market Basket Analysis

1. Association Rule Mining: This process entails detecting frequent item


sets and creating association rules that indicate the probability of
purchasing one item in conjunction with another. It is utilized to reveal
connections or associations between items within a transactional
dataset.
2. Sequence Analysis: This category of market basket analysis
concentrates on the order in which items are bought in a transaction. It
identifies frequent item sequences and generates sequential association
rules that describe the likelihood of one item sequence being followed
by another.
3. Cluster Analysis: This form of market basket analysis involves
categorizing similar items or transactions into clusters or segments
based on their characteristics. It aids in the identification of customer
segments with comparable purchasing behaviors, which can be valuable
for product recommendations and marketing strategies.

MODULE 3 | Mining Frequent Patterns, Association and Correlations 39


Association rules can be applied to diverse scenarios beyond traditional
retail "baskets." For instance:

 Analyzing items purchased using credit cards, such as rental cars


and hotel rooms, can offer valuable insights into customers'
subsequent product preferences.
 Similarly, studying optional services availed by telecommunications
customers, like call waiting, call forwarding, DSL, and speed call, aids
in optimizing service bundling for increased revenue.
 Examining banking products used by retail customers, such as
money market accounts, certificate of deposit, investment services,
and car loans, can identify potential customers for other offerings.
 Additionally, detecting uncommon combinations of insurance claims
can serve as a signal for potential fraud and trigger further
investigation.
 Moreover, analyzing medical patient histories can provide clues
about likely complications based on specific treatment combinations.

Association Rule Mining

Association rule mining is a valuable data mining method employed to


uncover intriguing connections, patterns, or associations among items in a
dataset. Its application is particularly beneficial in transactional databases,
where items are bought together, events coincide, or items co-occur under
specific circumstances.

The main goal of association rule mining is to detect rules that


demonstrate the frequency of item co-occurrences in transactions or events.
These rules are commonly presented as "if-then" statements, with the "if"
section representing the antecedent (a group of items) and the "then" section
representing the consequent (another set of items).

MODULE 3 | Mining Frequent Patterns, Association and Correlations 40


Association rule mining has a wide range of applications in various
industries due to its ability to discover interesting relationships and patterns
among items in datasets. Some of the key applications of association rule
mining include:

1. Market Basket Analysis: One of the most popular applications of


association rule mining is in retail for market basket analysis. It
helps retailers understand customer buying behavior by
identifying which items are frequently purchased together. This
information is used to optimize product placement, cross-selling,
and targeted marketing strategies.
2. Recommender Systems: Association rule mining is used in building
recommender systems, which provide personalized
recommendations to users based on their past behavior or
preferences. By discovering associations between items, the
system can suggest relevant products, movies, music, or articles
to users.
3. Web Usage Mining: In the field of web usage mining, association
rule mining is applied to analyze web log data and discover
patterns of website navigation by users. This information can be
used to improve website design, content organization, and user
experience.
4. Healthcare and Medical Research: Association rule mining is used
in healthcare to analyze medical data and identify patterns in
patient treatment and diagnosis. It can help in understanding
disease co-occurrences, drug interactions, and predicting patient
outcomes.
5. Fraud Detection: In financial and insurance industries, association
rule mining is applied to detect fraudulent activities by identifying
unusual patterns or combinations of transactions that might
indicate fraud.

MODULE 3 | Mining Frequent Patterns, Association and Correlations 41


6. Customer Segmentation: By analyzing customer purchase
behavior, association rule mining can help segment customers
into different groups based on their preferences and habits. This
information is valuable for targeted marketing and customer
retention strategies.
7. Supply Chain Management: Association rule mining is used in
supply chain management to identify patterns in product demand,
optimize inventory management, and understand the
relationships between different products in the supply chain.
8. Social Network Analysis: In social network analysis, association
rule mining can be applied to understand the connections and
interactions between users in a social network. It helps in
identifying influential users, community detection, and
recommendation of friends or connections.
9. Market Research: Association rule mining is used in market
research to analyze survey data and identify patterns or
associations among different variables, such as demographic
information and buying preferences.

Overall, association rule mining is a versatile technique with a wide


range of applications across different domains, providing valuable insights for
decision-making and improving business processes.

Measures used in Association Rule Mining

The two essential measures used in association rule mining are:

Support: Support measures how frequently an itemset (a combination


of items) appears in the dataset. It is the proportion of transactions that
contain the itemset. High support values indicate that the itemset is common
in the dataset.

MODULE 3 | Mining Frequent Patterns, Association and Correlations 42


Confidence: Confidence measures the reliability or strength of the
association between the antecedent and the consequent in a rule. It is the
proportion of transactions containing the antecedent that also contain the
consequent. High confidence values indicate a strong association between the
items.

Rule Form:

Antecedent → Consequent [support, confidence]

Example 1:

 buys(x, “computer”) → buys(x, “financial management software”)

[0.5%, 60%]

 age(x, “30..39”) ^ income(x, “42..48K”) → buys(x, “car”)

[1%,75%]

MODULE 3 | Mining Frequent Patterns, Association and Correlations 43


Orange juice and soda are more likely to be purchased together than
any other two items.

Detergent is never purchased with window cleaner or milk.

Milk is never purchased with soda or detergent.

Example 2:

The store has recorded the following transactions over a certain period:

Transaction 1: Bread, Milk, Eggs, Cheese

Transaction 2: Bread, Milk, Diapers, Beer

Transaction 3: Milk, Diapers, Beer, Chips

Transaction 4: Bread, Eggs, Cheese

Transaction 5: Milk, Diapers, Beer, Bread

Now, let's perform market basket analysis with a minimum support


threshold of 2 (meaning the itemsets must appear in at least two
transactions) and a minimum confidence threshold of 50%.

 Step 1: Frequent Itemset Generation

Based on the minimum support threshold, we find the frequent itemsets:

- {Bread, Milk}: Appears in transactions 1, 2, and 5

- {Milk, Diapers}: Appears in transactions 2, 3, and 5

- {Milk, Beer}: Appears in transactions 2 and 3

- {Bread, Eggs}: Appears in transactions 1 and 4

- {Bread, Cheese}: Appears in transactions 1

- {Diapers, Beer}: Appears in transactions 2 and 3

 Step 2: Association Rules Generation

MODULE 3 | Mining Frequent Patterns, Association and Correlations 44


For each frequent itemset, we generate association rules with the
minimum confidence threshold:

- {Bread} => {Milk}: Appears in transactions 1, 2, and 5 (

support = 3), confidence = 3/4 = 75%

- {Milk} => {Diapers}: Appears in transactions 2, 3, and 5

(support = 3), confidence = 3/4 = 75%

- {Milk} => {Beer}: Appears in transactions 2 and 3

(support = 2), confidence = 2/4 = 50%

- {Bread} => {Eggs}: Appears in transactions 1 and 4

(support = 2), confidence = 2/3 = 66.67%

- {Bread} => {Cheese}: Appears in transactions 1

(support = 1), confidence = 1/2 = 50%

- {Diapers} => {Beer}: Appears in transactions 2 and 3

(support = 2), confidence = 2/3 = 66.67%

 Step 3: Interpretation

Based on the results of the analysis, we can draw some insights:

- Customers who buy bread are highly likely to buy milk and vice
versa (75% confidence).

- Customers who buy milk are highly likely to buy diapers

(75% confidence).

- Customers who buy milk are moderately likely to buy beer

(50% confidence).

- Customers who buy bread are moderately likely to buy eggs

MODULE 3 | Mining Frequent Patterns, Association and Correlations 45


(66.67% confidence).

- Customers who buy bread are moderately likely to buy cheese

(50% confidence).

- Customers who buy diapers are moderately likely to buy beer

(66.67% confidence).

These insights can help the store in various ways, such as optimizing
product placement, running targeted promotions, and improving inventory
management.

MODULE 3 | Mining Frequent Patterns, Association and Correlations 46


ASSESSMENT TASK

1. What is market basket analysis?


______________________________________________________________________
______________________________________________________________________
______________________________________________________________________

2. What is the objective of market basket analysis?


______________________________________________________________________
______________________________________________________________________
______________________________________________________________________

3. How do you calculate market basket analysis?


______________________________________________________________________
______________________________________________________________________
______________________________________________________________________

4. Give a simple example of a market basket analysis.


______________________________________________________________________
______________________________________________________________________
______________________________________________________________________

Lesson 2: Apriori Algorithm

MODULE 3 | Mining Frequent Patterns, Association and Correlations 47


The initial proposal for frequent itemset mining was the Apriori
algorithm, which was subsequently enhanced by R Agarwal and R Srikant and
became known as Apriori. This algorithm employs two key steps, namely "join"
and "prune," to limit the search space, making it an iterative method for
identifying the most frequent itemsets.

Apriori says:

The condition for an item I to be considered not frequent is as follows:

o If the probability P(I) of item I is below the minimum support


threshold, then I is deemed not frequent.
o Similarly, if the joint probability P(I+A) of item I and another item
A, belonging to the same itemset, falls below the minimum
support threshold, then the combined itemset I+A is also
considered not frequent.
o Additionally, if an itemset's value is less than the minimum
support, then all of its supersets will also have values below the
minimum support, making them irrelevant. This property is
referred to as the Antimonotone property.

Steps in Apriori Algorithm:

The Apriori algorithm is a well-known method for mining frequent itemsets in


transactional databases. It employs a step-by-step approach to efficiently
discover the most frequent itemsets. The key steps of the Apriori algorithm
are as follows:

1. Generate Initial Candidate Itemsets (Level 1):


 Scan the database to count the occurrences of each individual item
(1-itemsets).
 Form a list of 1-itemsets with support values greater than or equal to
the minimum support threshold.
2. Identify Frequent Itemsets (Level 1):

MODULE 3 | Mining Frequent Patterns, Association and Correlations 48


 Consider the frequent 1-itemsets found in the first step as frequent
itemsets at level 1.
3. Generate Candidate Itemsets (Level k > 1):
 Based on the frequent itemsets discovered in the previous level,
generate candidate (k+1)-itemsets.
 Form candidate (k+1)-itemsets by joining two frequent k-itemsets if
their first (k-1) items are the same. For example, if {A, B} and {B, C}
are frequent 2-itemsets, then {A, B, C} becomes a candidate 3-
itemset.
4. Prune the Candidate Itemsets (Level k > 1):
 Remove any candidate (k+1)-itemsets that contain subsets of size k
that are not frequent. This is based on the Apriori principle, which
states that any subset of a frequent itemset must also be frequent.
5. Scan the Database and Count Support:
 Count the support for each candidate (k+1)-itemset by scanning the
database and tallying the occurrences of each candidate itemset.
6. Generate Frequent Itemsets (Level k > 1):
 Form a list of frequent (k+1)-itemsets by selecting the itemsets with
support values greater than or equal to the minimum support
threshold.
7. Repeat the Process:
 Continue the iteration, generating candidate itemsets, pruning, and
counting support until no more frequent itemsets can be found.

The Apriori algorithm terminates when it can no longer produce


additional frequent itemsets, and it outputs all the frequent itemsets
discovered in the database. These frequent itemsets are sets of items that
meet the minimum support threshold and represent the most recurring
patterns within the dataset. The strength of the Apriori algorithm lies in its
ability to efficiently reduce the search space by employing the Apriori

MODULE 3 | Mining Frequent Patterns, Association and Correlations 49


principle, which avoids exploring candidate itemsets that cannot be frequent
based on the frequency of their subsets.

Example:

Sure, let's go through a simple example computation of the Apriori


algorithm for frequent itemset mining. Consider the following transactional
database:

Transaction Items
1 A, B, C
2 A, B, D, E
3 A, C, E
4 B, D, E
5 C, E

Let's assume the minimum support threshold is set to 2 (i.e., an itemset


must appear in at least 2 transactions to be considered frequent).

Step 1: Generate Candidate Itemsets (Level 1)

- Count the occurrences of individual items in the transactions:

Item Support
A 3
B 3
C 3
D 2
E 3

MODULE 3 | Mining Frequent Patterns, Association and Correlations 50


- Form a list of frequent 1-itemsets with support greater than or
equal to 2:

Frequent 1-itemsets: {A, B, C, E}

Step 2: Generate Candidate Itemsets (Level 2)

- Based on the frequent 1-itemsets, generate candidate 2-


itemsets:

Candidate 2-itemsets: {A, B}, {A, C}, {A, E}, {B, C},
{B, E}, {C, E}

Step 3: Prune the Candidate Itemsets (Level 2)

- Since the candidate 2-itemsets must have subsets of size 1 that


are frequent, we can prune the following candidate itemsets:

Pruned Candidate 2-itemsets: {A, B}, {A, C}, {A, E}, {B, C},
{B, E}, {C, E}

Step 4: Scan the Database and Count Support for Candidate


Itemsets (Level2)

- Count the support for each candidate 2-itemset in the


transactions:

Item Support
{A, B} 2
{A, C} 2
{A, E} 2
{B, C} 1
{B, E} 1
{C, E} 2

Step 5: Generate Frequent Itemsets (Level 2)

MODULE 3 | Mining Frequent Patterns, Association and Correlations 51


- Form a list of frequent 2-itemsets with support greater than or
equal to 2:

Frequent 2-itemsets: {A, B}, {A, C}, {A, E}, {C, E}

Step 6: Generate Candidate Itemsets (Level 3)

- Based on the frequent 2-itemsets, generate candidate 3-


itemsets:

Candidate 3-itemsets: {A, B, C}, {A, B, E}, {A, C, E}

Step 7: Prune the Candidate Itemsets (Level 3)

- Since the candidate 3-itemsets must have subsets of size 2 that


are frequent, we can prune the following candidate itemsets:

Step 8: Scan the Database and Count Support for Candidate


Itemsets (Level3)

- Count the support for each candidate 3-itemset in the


transactions:

Item Support
{A, B,C} 1
{A, C, E} 1

Step 9: Generate Frequent Itemsets (Level 3)

- Form a list of frequent 3-itemsets with support greater than or


equal to 2:

Frequent 3-itemsets: None (No frequent 3-itemset with


support >= 2)

The computation stops since there are no more frequent itemsets to be


generated. The frequent itemsets discovered in the database are: {A, B}, {A,

MODULE 3 | Mining Frequent Patterns, Association and Correlations 52


C}, {A, E}, {C, E}. These are the sets of items that meet the minimum
support threshold and represent the most frequent patterns within the
dataset.

ASSESSMENT TASK

A. Consider the following dataset and we will find frequent itemsets and
generate association rules for them.

Transaction Items
ID
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3

Minimum support count: 2


Minimum Confidence: 60%

MODULE 3 | Mining Frequent Patterns, Association and Correlations 53


References / Further Readings

Textbooks / References

Gordon S. Linoff and Michael J. Berry (2011). “Data Mining Techniques:

For Marketing, Sales, and Customer Relationship Management”.

Jiawei Han and Micheline Kamber. (2006). “Data Mining: Concepts and

Techniques”. Morgan Kaufman Publishers.

Mathur, Vrinda. (2022). “Association Rule mIning: Importance and


Steps”. https://www.analyticssteps.com/blogs/association-rule-mining-
importance-and-steps

MODULE 3 | Mining Frequent Patterns, Association and Correlations 54


MODULE 3 | Mining Frequent Patterns, Association and Correlations 55

You might also like