Module 4 Full
Module 4 Full
MANIKANDAN
Professor, Department of CSE,
Universal Engineering College.
Association Rules – Introduction
Methods to discover Association rules
Apriori (Level-wise algorithm)
Partition Algorithm
Pincer Search Algorithm
Dynamic Itemset Counting Algorithm
FP-tree Growth Algorithm
Dr.L.C.MANIKANDAN 3/18/2025 2
Association rule mining finds interesting associations and relationships
among large sets of data items.
This rule shows how frequently an itemset occurs in a transaction.
Example: Market Based Analysis
It allows retailers to identify relationships between the items that people
buy together frequently.
This process analyzes customer buying habits by finding associations
between the different items that customers place in their “shopping
baskets”.
The discovery of such associations can help retailers develop marketing
strategies by gaining insight into which items are frequently purchased
together by customers.
Dr.L.C.MANIKANDAN 3/18/2025 3
Example: If customers are
buying milk, how likely are
they to also buy bread on the
same trip to the supermarket?
Such information can lead to
increased sales by helping
retailers do selective
marketing and plan their shelf
space.
Dr.L.C.MANIKANDAN 3/18/2025 4
Frequent Itemset
A set of items is referred to as an itemset.
An itemset that contains k items is a k-itemset.
The set {computer, antivirus_software} is a 2-itemset.
A frequent item set is a set of items that occur together frequently in a
dataset.
Ex1: In a supermarket environment, the items bread and butter are likely
to be purchased together by many customers.
So, {bread, butter} is an example for frequent itemset.
The association between the items are represented by the association
rule;
bread=>butter
Dr.L.C.MANIKANDAN 3/18/2025 5
Eg2: In an electronic store, customers who purchase computers also
tend to buy antivirus software at the same time.
So, {computer, antivirus software} is an example for frequent
itemset.
It is represented by the following association rule;
Dr.L.C.MANIKANDAN 3/18/2025 6
Measures of Rule Interestingness:
Support and Confidence are two measures of rule interestingness.
Support reflects the usefulness of discovered association rules.
Confidence reflects the certainty of discovered association rules.
Association rules are considered interesting if they satisfy both a
minimum support threshold and a minimum confidence threshold.
Such thresholds can be set by users or domain experts.
Ex: Consider the following association rule;
Dr.L.C.MANIKANDAN 3/18/2025 7
A support of 2% means that 2% of all the transactions under analysis show that
computer and antivirus software are purchased together.
A confidence of 60% means that 60% of the customers who purchased a
computer also bought the software.
Dr.L.C.MANIKANDAN 3/18/2025 9
Apriori Algorithm is a fundamental method in association rule mining
Primarily used to find frequent itemsets in large datasets.
It follows a level-wise approach, where frequent itemsets are iteratively
expanded using the Apriori property.
Key Concept:
If an itemset is frequent, then all its subsets must also be frequent
(Apriori Property).
If an itemset is infrequent, then all its supersets must also be infrequent.
Commonly used in: Market Basket Analysis, Recommendation Systems,
Fraud Detection.
Dr.L.C.MANIKANDAN 3/18/2025 10
Working of Apriori Algorithm
Step 1: Count Individual Item Frequencies
Scan the database and count the occurrences of each 1-itemset.
Remove infrequent items (i.e., those below the minimum support
threshold).
Step 2: Generate Candidate Itemsets (Ck)
Use previous frequent itemsets (Lk-1) to generate new k-itemsets
(Ck).
Only keep those whose subsets are frequent (Apriori Pruning).
Step 3: Compute Support & Prune Infrequent Itemsets
Scan the database and count occurrences of candidate k-itemsets.
Remove itemsets below the minimum support.
Dr.L.C.MANIKANDAN 3/18/2025 11
Step 4: Repeat Until No More Frequent Itemsets
Continue generating larger itemsets until no new frequent
itemsets are found.
Use these frequent itemsets to generate association rules
(Confidence & Lift).
Dr.L.C.MANIKANDAN 3/18/2025 12
Example: Dataset Example (Market Basket Transactions)
Transaction ID Items Purchased
T1 A, B, C
T2 A, C
T3 B, C, D
T4 A, B, D
T5 A, B, C, D
Dr.L.C.MANIKANDAN 3/18/2025 14
Association Rule Generation
Once frequent itemsets are identified, we generate association rules using
Confidence & Lift.
Example Rule: {A, B} → {C}
Confidence = Support(A, B, C) / Support(A, B)
Advantages
Easy to Understand & Implement
Works Well for Small to Medium Datasets
Finds Strong Association Rules
Disadvantages
Multiple Database Scans → Slow for large datasets.
Exponential Candidate Growth → High memory usage.
Not Efficient for High-Dimensional Data
Dr.L.C.MANIKANDAN 3/18/2025 15
It is an improved approach to frequent itemset mining.
Designed to find maximal frequent itemsets in large transactional databases.
It optimizes the traditional Apriori algorithm by combining bottom-up
(support-based pruning) and top-down (maximal frequent itemset search)
approaches.
Advantages:
Faster than Apriori (Fewer database scans).
Efficient for large datasets with long frequent itemsets.
Reduces computational complexity by using both top-down and
bottom-up approaches.
Dr.L.C.MANIKANDAN 3/18/2025 18
Transaction ID Items Purchased
Example: dataset: T1 A, B, C
T2 A, B
T3 A, C
T4 B, C, D
T5 A, B, C, D
Key Idea
Instead of scanning the entire database multiple times, the algorithm first
identifies local frequent itemsets within each partition.
And then merges them to find globally frequent itemsets.
Dr.L.C.MANIKANDAN 3/18/2025 20
Steps of the Partitioning Algorithm
Step 1: Divide the Database into Partitions
The dataset is split into multiple partitions (subsets).
Each partition is processed independently, reducing memory
overhead.
Dr.L.C.MANIKANDAN 3/18/2025 21
Step 3: Merge Local Frequent Itemsets
A global candidate set is formed by combining frequent itemsets
from all partitions.
The final global support count is calculated for each itemset across
the full dataset.
Only one final scan is needed, making the algorithm much faster than
Apriori.
Dr.L.C.MANIKANDAN 3/18/2025 24
How Partitioning Algorithm Solves Apriori’s Disadvantages?
Key Concept:
Instead of scanning the database multiple times (like Apriori), DIC
interleaves candidate generation and counting within a single database
pass.
It dynamically starts counting new itemsets before previous iterations are
completed, making it faster than Apriori.
Used in: Market Basket Analysis, Recommendation Systems, Web Mining.
Dr.L.C.MANIKANDAN 3/18/2025 26
Working of Dynamic Itemset Counting
Step 1: Partition the Database
The database is divided into equal-sized partitions.
Instead of waiting for a full database scan, new itemsets start being
counted midway in different partitions.
Dr.L.C.MANIKANDAN 3/18/2025 28
Dataset Example (Market Basket Transactions)
Dr.L.C.MANIKANDAN 3/18/2025 29
Step 2: Start Counting Frequent 1-Itemsets
After Partition 1, {A, B, C} are candidates.
After Partition 2, {D} appears frequently.
Frequent itemsets start forming dynamically before scanning all partitions.
Disadvantages:
Complex Implementation → More difficult than Apriori.
Dr.L.C.MANIKANDAN 3/18/2025 31
Frequent Pattern Tree (FP-Tree) Growth Algorithm is an efficient algorithm
used for frequent itemset mining in large datasets.
It is an improvement over the Apriori algorithm
it avoids multiple database scans and candidate generation, making it
faster and more scalable.
Key Concept:
Uses a compact tree structure (FP-tree) to store frequent itemsets.
Eliminates the need for candidate generation like in Apriori.
Reduces database scans, improving efficiency for large datasets.
Used in: Market Basket Analysis, Web Mining, Bioinformatics.
Dr.L.C.MANIKANDAN 3/18/2025 32
Working of FP-Tree Growth Algorithm
Step 1: Scan the Database & Find Frequent Items
dataset is scanned once to compute the support count of each item.
Items below minimum support are removed.
Dr.L.C.MANIKANDAN 3/18/2025 35
Step 3: Extract Frequent Itemsets using Conditional FP-Trees
Conditional FP-trees are built for each frequent item.
Frequent itemsets are extracted recursively.
Frequent Itemsets Found:
{A}, {B}, {C}, {D}, {A, B}, {A, C}, {B, C}, {B, D}, {A, B, C}, {B, C, D}
Dr.L.C.MANIKANDAN 3/18/2025 36
Dr.L.C.MANIKANDAN 3/18/2025 37