Module-4 DM _introduction

Association rule mining aims to uncover interesting relationships between items in a dataset using 'if-then' rules. The process involves frequent itemset mining, rule generation, and evaluation based on metrics like support, confidence, and lift, with applications in fields such as retail and healthcare. Key algorithms include the Apriori, FP-Growth, and Eclat algorithms, each with unique approaches to efficiently identify and generate association rules.

Uploaded by

gomotiveofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Module-4 DM _introduction

Uploaded by

gomotiveofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Module 4

ASSOCIATION RULE MINING

 Goal:
Association rule mining aims to identify interesting relationships or
associations between different items or variables within a dataset.
 "If-then" Rules:
These rules are expressed in the form of "if X, then Y", where X is the
condition (antecedent) and Y is the outcome (consequent).
 Applications:
It's widely used in various fields, including market basket analysis (identifying
products bought together), web usage mining, bioinformatics, and more.
How it works:
 Frequent Itemset Mining:
The process typically starts with identifying frequent itemsets, which are
groups of items that frequently occur together in the dataset.
 Rule Generation:
Once frequent itemsets are identified, association rules are generated based
on the co-occurrence of items within these itemsets.
 Rule Evaluation:
The generated rules are then evaluated based on metrics like support,
confidence, and lift to determine their strength and relevance.
Examples:
 Market Basket Analysis: "If a customer buys bread, they are also likely to
buy milk".
 Retail: Identifying products that are frequently purchased together to improve
store layout, product placement, and marketing efforts.
 Healthcare: Studying frequent symptom clusters to guide diagnoses or
identify risk factors.
 Finance: Detecting unusual purchase or transfer patterns that may indicate
fraud.
Key Algorithms:
 Apriori Algorithm: A well-known algorithm for finding frequent itemsets and
generating association rules.
 FP-Growth Algorithm: Another popular algorithm, particularly efficient for
large datasets.
 Eclat Algorithm: An algorithm that uses a different approach to find frequent
itemsets.

1. Frequent Itemset Generation:

In this algorithm, we have to start the process by identifying frequent item sets in the
dataset. The frequent item is a collection of items (or attributes) frequently occurring in
the data.

We can measure the frequency of the data set by using a metric called support. These
supports are represented by the proportion of transactions or records in which the
itemset appears.
We have to use the Apriori algorithm, which uses the bottom-up approach. At first, we
have to look for the frequent individual items, and then we have to gradually combine
them to find larger itemsets.

2. Association Rule Generation:

After identifying the frequent items, we have to generate the association rules from
these items.

Then, we have to write the association rule, which will be in the form of "if-then"
statements, where the "if" part is called the antecedent (premise), and the "then" part is
called the consequent (conclusion).

Then, we have to explore the Apriori algorithm, which combines items within frequent
itemsets to generate potential association rules.

3. Rule Pruning:

We have to apply some criteria to ensure that only meaningful rules are generated. The
most useful criteria are as follows.

1. Support Threshold: we have to create a rule with support to be considered valid.

This ensures that the rule applies to a sufficient number of transactions.
2. Confidence Threshold: A rule must have a minimum confidence level to be
considered interesting. Confidence is the probability that the antecedent implies
the consequent, and it measures the strength of the association.
3. Lift Threshold: Lift is a measure that compares the observed support of the rule
to what would be expected if the items in the rule were independent. A lift value
greater than 1 indicates a positive association, while a lift value less than 1
indicates a negative association.
4. Iterative Process:

In this process, we have to iterate the Apriori algorithm by generating itemsets, creating
rules, and pruning rules until no more valid rules can be generated.

Then, we have to perform the iteration in which the algorithm employs a "downward
closure property," which states that if an item is frequent, all of its subsets are also
frequent. This property helps reduce the computational complexity of the algorithm.

Key Algorithms for Association Rule Mining:

 Apriori Algorithm:
This is a classic algorithm that iteratively identifies frequent itemsets by
scanning the database multiple times.
 It uses a bottom-up approach, starting with individual items and gradually
combining them into larger itemsets.
 It's known for its simplicity and effectiveness, but can be computationally
expensive for large datasets.

To enhance the Apriori algorithm's efficiency, you can employ hash-based

techniques for itemset counting, transaction reduction by discarding infrequent
transactions, and partitioning the database into smaller segments for parallel
processing.
Here's a more detailed explanation of each method:
 Hash-Based Techniques:
 Instead of scanning the entire database multiple times to count itemset support,
use hash tables or hash trees to efficiently store and update itemset counts.
 This reduces the time complexity of counting frequent itemsets, especially for
larger datasets.
 Transaction Reduction:
 Identify and remove transactions that do not contain any of the frequent itemsets
found in previous iterations.
 This reduces the number of transactions that need to be scanned in subsequent
iterations, leading to faster processing.
 Partitioning:
 Divide the database into smaller partitions and find frequent itemsets in each
partition independently.
 Combine the results from each partition to identify the global frequent itemsets.
 This approach allows for parallel processing and can significantly reduce the
overall time taken to find frequent itemsets, especially for very large datasets.

 FP-Growth (Frequent Pattern Growth) Algorithm:

This algorithm uses a tree-like structure (FP-tree) to represent frequent
patterns, making it faster and more efficient than Apriori, especially for larger
datasets.
 It avoids the need for multiple database scans by storing the frequent itemsets
in a compressed format.

Decision Support and Business Intelligence 9th Edition
67% (21)
Decision Support and Business Intelligence 9th Edition
715 pages
Chapter 1. Introduction To Computer Ethics: 1.1 Scenarios
No ratings yet
Chapter 1. Introduction To Computer Ethics: 1.1 Scenarios
8 pages
New Trends and Developments in Automotive Industry
No ratings yet
New Trends and Developments in Automotive Industry
406 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
15 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Unit - III
No ratings yet
Unit - III
27 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
No ratings yet
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
7 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
DMDW-U3
No ratings yet
DMDW-U3
16 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Contents
No ratings yet
Contents
59 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Data Mining mod 2
No ratings yet
Data Mining mod 2
7 pages
Association and Recommendation System
No ratings yet
Association and Recommendation System
24 pages
DMDW_Association Analysis
No ratings yet
DMDW_Association Analysis
12 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
No ratings yet
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
81 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Data Mining PPT 7
No ratings yet
Data Mining PPT 7
14 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
Association
No ratings yet
Association
40 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Unit3 Data mining Pattern
No ratings yet
Unit3 Data mining Pattern
46 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
667a8d24bb947_ppt
No ratings yet
667a8d24bb947_ppt
24 pages
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
No ratings yet
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
37 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Ex 9 DWM Aryant
No ratings yet
Ex 9 DWM Aryant
9 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
Unit 5
No ratings yet
Unit 5
40 pages
Dmbi Lab 7om
No ratings yet
Dmbi Lab 7om
8 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
Unit_3 Mining Frequent Patterns
No ratings yet
Unit_3 Mining Frequent Patterns
10 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Unit-2
No ratings yet
Unit-2
8 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
Addis Ababa University
No ratings yet
Addis Ababa University
6 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Unit-2
No ratings yet
Unit-2
65 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Association Rules
No ratings yet
Association Rules
48 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
ChatPDF-DataMining Lec4 (1)
No ratings yet
ChatPDF-DataMining Lec4 (1)
5 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Week 3
No ratings yet
Week 3
56 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Data Analytics. Fast Overview.
From Everand
Data Analytics. Fast Overview.
George Letton
2.5/5 (18)
Forward Chaining: Fundamentals and Applications
From Everand
Forward Chaining: Fundamentals and Applications
Fouad Sabry
No ratings yet
MIDS Lab Theory
No ratings yet
MIDS Lab Theory
6 pages
PAKDD 2006 Data Mining Competition: Project Report
No ratings yet
PAKDD 2006 Data Mining Competition: Project Report
32 pages
DWDM
No ratings yet
DWDM
2 pages
Ml Assignment 2
No ratings yet
Ml Assignment 2
6 pages
Detecting_Outliers_in_High_Dimensional_Data_Sets_U
No ratings yet
Detecting_Outliers_in_High_Dimensional_Data_Sets_U
6 pages
Applied Data Science Lessons Learned For The Datadriven Business 1st Ed Martin Braschler pdf download
No ratings yet
Applied Data Science Lessons Learned For The Datadriven Business 1st Ed Martin Braschler pdf download
82 pages
Klasifikasi Penipuan Transaksi Kartu Kredit Menggunakan Metode Random Forest 18219061 Gian Denggan Bendjamin
No ratings yet
Klasifikasi Penipuan Transaksi Kartu Kredit Menggunakan Metode Random Forest 18219061 Gian Denggan Bendjamin
8 pages
Comparative Study Between Density Based Clustering - Dbscan and Optics
No ratings yet
Comparative Study Between Density Based Clustering - Dbscan and Optics
4 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
1) Types of Machine Learning ? 2) Machine Learning Techniques ? 3) Unsupervised Learning Techniques ? 4) K-Means Technique ?
No ratings yet
1) Types of Machine Learning ? 2) Machine Learning Techniques ? 3) Unsupervised Learning Techniques ? 4) K-Means Technique ?
15 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
r05321204 Data Warehousing and Data Mining
No ratings yet
r05321204 Data Warehousing and Data Mining
5 pages
Instant Access to Artificial Intelligence in Medicine: Applications, Limitations and Future Directions Manda Raz (Editor) ebook Full Chapters
100% (1)
Instant Access to Artificial Intelligence in Medicine: Applications, Limitations and Future Directions Manda Raz (Editor) ebook Full Chapters
52 pages
Data Science
No ratings yet
Data Science
8 pages
CFP - 6th International Conference on Data Science and Cloud Computing (DSCC 2025)
No ratings yet
CFP - 6th International Conference on Data Science and Cloud Computing (DSCC 2025)
3 pages
Teradata Warehouse Miner
No ratings yet
Teradata Warehouse Miner
3 pages
MIDAS-2025 Brochure Revised Dates
No ratings yet
MIDAS-2025 Brochure Revised Dates
2 pages
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
No ratings yet
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
6 pages
Monitoring Online Tests Through Data Visualization
No ratings yet
Monitoring Online Tests Through Data Visualization
26 pages
Cluster Analysis Exercise
No ratings yet
Cluster Analysis Exercise
2 pages
Project Closure and Handover
No ratings yet
Project Closure and Handover
38 pages
Topic 6 Managerial Support Systems
No ratings yet
Topic 6 Managerial Support Systems
47 pages
Frequent Pattern and Association Mining
No ratings yet
Frequent Pattern and Association Mining
3 pages
Project Report Soft
No ratings yet
Project Report Soft
123 pages
Applications & Trends in Data Mining: Gaurav Gupta, Geetika Hans, Tamanna Sehgal
No ratings yet
Applications & Trends in Data Mining: Gaurav Gupta, Geetika Hans, Tamanna Sehgal
3 pages
Publication List of DR Ajay Rana
No ratings yet
Publication List of DR Ajay Rana
20 pages

Module-4 DM _introduction

Uploaded by

Module-4 DM _introduction

Uploaded by

Module 4

ASSOCIATION RULE MINING

1. Frequent Itemset Generation:

2. Association Rule Generation:

1. Support Threshold: we have to create a rule with support to be considered valid.

Key Algorithms for Association Rule Mining:

To enhance the Apriori algorithm's efficiency, you can employ hash-based

 FP-Growth (Frequent Pattern Growth) Algorithm:

You might also like