0% found this document useful (0 votes)
4 views

Module-4 DM _introduction

Association rule mining aims to uncover interesting relationships between items in a dataset using 'if-then' rules. The process involves frequent itemset mining, rule generation, and evaluation based on metrics like support, confidence, and lift, with applications in fields such as retail and healthcare. Key algorithms include the Apriori, FP-Growth, and Eclat algorithms, each with unique approaches to efficiently identify and generate association rules.

Uploaded by

gomotiveofficial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module-4 DM _introduction

Association rule mining aims to uncover interesting relationships between items in a dataset using 'if-then' rules. The process involves frequent itemset mining, rule generation, and evaluation based on metrics like support, confidence, and lift, with applications in fields such as retail and healthcare. Key algorithms include the Apriori, FP-Growth, and Eclat algorithms, each with unique approaches to efficiently identify and generate association rules.

Uploaded by

gomotiveofficial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Module 4

ASSOCIATION RULE MINING

 Goal:
Association rule mining aims to identify interesting relationships or
associations between different items or variables within a dataset.
 "If-then" Rules:
These rules are expressed in the form of "if X, then Y", where X is the
condition (antecedent) and Y is the outcome (consequent).
 Applications:
It's widely used in various fields, including market basket analysis (identifying
products bought together), web usage mining, bioinformatics, and more.
How it works:
 Frequent Itemset Mining:
The process typically starts with identifying frequent itemsets, which are
groups of items that frequently occur together in the dataset.
 Rule Generation:
Once frequent itemsets are identified, association rules are generated based
on the co-occurrence of items within these itemsets.
 Rule Evaluation:
The generated rules are then evaluated based on metrics like support,
confidence, and lift to determine their strength and relevance.
Examples:
 Market Basket Analysis: "If a customer buys bread, they are also likely to
buy milk".
 Retail: Identifying products that are frequently purchased together to improve
store layout, product placement, and marketing efforts.
 Healthcare: Studying frequent symptom clusters to guide diagnoses or
identify risk factors.
 Finance: Detecting unusual purchase or transfer patterns that may indicate
fraud.
Key Algorithms:
 Apriori Algorithm: A well-known algorithm for finding frequent itemsets and
generating association rules.
 FP-Growth Algorithm: Another popular algorithm, particularly efficient for
large datasets.
 Eclat Algorithm: An algorithm that uses a different approach to find frequent
itemsets.

1. Frequent Itemset Generation:

In this algorithm, we have to start the process by identifying frequent item sets in the
dataset. The frequent item is a collection of items (or attributes) frequently occurring in
the data.

We can measure the frequency of the data set by using a metric called support. These
supports are represented by the proportion of transactions or records in which the
itemset appears.
We have to use the Apriori algorithm, which uses the bottom-up approach. At first, we
have to look for the frequent individual items, and then we have to gradually combine
them to find larger itemsets.

2. Association Rule Generation:

After identifying the frequent items, we have to generate the association rules from
these items.

Then, we have to write the association rule, which will be in the form of "if-then"
statements, where the "if" part is called the antecedent (premise), and the "then" part is
called the consequent (conclusion).

Then, we have to explore the Apriori algorithm, which combines items within frequent
itemsets to generate potential association rules.

3. Rule Pruning:

We have to apply some criteria to ensure that only meaningful rules are generated. The
most useful criteria are as follows.

1. Support Threshold: we have to create a rule with support to be considered valid.


This ensures that the rule applies to a sufficient number of transactions.
2. Confidence Threshold: A rule must have a minimum confidence level to be
considered interesting. Confidence is the probability that the antecedent implies
the consequent, and it measures the strength of the association.
3. Lift Threshold: Lift is a measure that compares the observed support of the rule
to what would be expected if the items in the rule were independent. A lift value
greater than 1 indicates a positive association, while a lift value less than 1
indicates a negative association.
4. Iterative Process:

In this process, we have to iterate the Apriori algorithm by generating itemsets, creating
rules, and pruning rules until no more valid rules can be generated.

Then, we have to perform the iteration in which the algorithm employs a "downward
closure property," which states that if an item is frequent, all of its subsets are also
frequent. This property helps reduce the computational complexity of the algorithm.

Key Algorithms for Association Rule Mining:


 Apriori Algorithm:
This is a classic algorithm that iteratively identifies frequent itemsets by
scanning the database multiple times.
 It uses a bottom-up approach, starting with individual items and gradually
combining them into larger itemsets.
 It's known for its simplicity and effectiveness, but can be computationally
expensive for large datasets.

To enhance the Apriori algorithm's efficiency, you can employ hash-based


techniques for itemset counting, transaction reduction by discarding infrequent
transactions, and partitioning the database into smaller segments for parallel
processing.
Here's a more detailed explanation of each method:
 Hash-Based Techniques:
 Instead of scanning the entire database multiple times to count itemset support,
use hash tables or hash trees to efficiently store and update itemset counts.
 This reduces the time complexity of counting frequent itemsets, especially for
larger datasets.
 Transaction Reduction:
 Identify and remove transactions that do not contain any of the frequent itemsets
found in previous iterations.
 This reduces the number of transactions that need to be scanned in subsequent
iterations, leading to faster processing.
 Partitioning:
 Divide the database into smaller partitions and find frequent itemsets in each
partition independently.
 Combine the results from each partition to identify the global frequent itemsets.
 This approach allows for parallel processing and can significantly reduce the
overall time taken to find frequent itemsets, especially for very large datasets.

 FP-Growth (Frequent Pattern Growth) Algorithm:


This algorithm uses a tree-like structure (FP-tree) to represent frequent
patterns, making it faster and more efficient than Apriori, especially for larger
datasets.
 It avoids the need for multiple database scans by storing the frequent itemsets
in a compressed format.

You might also like