Data Mining Experiment 4
Data Mining Experiment 4
Roll No : 160622733182
Name : Tabasum Syed Tajamul
Date: 03/03/2025
Aim: Create the following supermarket data in .arff format
4(a) Apply the apriori algorithm with support = 0.2, confidence = 0.5 & generate 5
frequent itemsets and rules
4(b) Apply the apriori algorithm with support = 0.2, lift = 0.5 & generate 5 frequent
patterns and rules
(a) Apply the apriori algorithm with support = 0.2, confidence = 0.5 & generate 5 frequent
itemsets and rules
Description:
Association Rule Mining: Association Rule Mining is a data mining technique used to identify
relationships between items in large datasets. It helps uncover patterns, such as which products
are frequently bought together in a store. Key metrics include support, which measures how
often an itemset appears in transactions, confidence, which indicates the likelihood of one item
appearing when another does, and lift, which evaluates the strength of an association beyond
random chance.
For example, a supermarket may discover that 80% of customers who buy bread also purchase
butter. This insight can help businesses optimize product placement and marketing strategies.
Popular algorithms for association rule mining include Apriori, which generates frequent
itemsets iteratively, and FP-Growth, which builds a tree structure to find patterns more
efficiently.
Market Basket Analysis: Market Basket Analysis is a data mining technique used to identify
patterns in customer purchasing behavior. It helps businesses understand which products are
frequently bought together, enabling better decision-making in sales, marketing, and inventory
management. MBA uses association rule mining to discover relationships between items in
transaction data.
Frequent Item: A frequent item is an item or a set of items that appear together in a dataset with
a frequency above a specified threshold. In association rule mining, frequent items are identified
using the support metric, which measures how often an item or itemset appears in transactions.
An itemset is a collection of one or more items. If the occurrence of an itemset exceeds a
predefined minimum support threshold, it is considered frequent.
Support: The proportion of transactions that contain a particular item or itemset. It helps identify
frequently bought items.
Formula:
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝑋
𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋
) =
𝑇𝑜𝑡𝑎𝑙 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠
Confidence: The probability that a customer who buys item X also buys item Y. It measures the
reliability of the association rule.
Formula:
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑋→𝑌) =𝑆𝑢𝑝𝑝𝑜𝑟𝑡(
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝑋∪𝑌)
𝑋)
Lift: Measures how much more likely two items are bought together compared to random
chance.
Formula:
𝐿𝑖𝑓𝑡(𝑋→𝑌
) =
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑋
→𝑌)
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝑌)
Algorithm Apriori:
1) Collect the dataset: Gather transactional data where each transaction contains a set of
items.
2) Generate frequent 1-itemsets (L1): Compute support for individual items and discard
those below the minimum support threshold.
3) Generate k-itemsets iteratively:
● Use frequent (k-1)-itemsets (Lk-1) to generate candidate k-itemsets (Ck).
● Prune non-frequent subsets and compute support for Ck.
● Retain itemsets meeting the minimum support threshold, forming Lk.
4) Repeat step 3 until no more frequent itemsets can be generated.
5) Extract association rules from frequent itemsets and evaluate their strength using
confidence, keeping those above the minimum confidence threshold.
Results:
1) Open notepad
2) Enter the dataset as follows:
Tid Itemset
T3 {cheese, yogurt}
T5 {egg, juice}
Figure 5: supermarket.arff
(b) Apply the apriori algorithm with support = 0.2, lift = 0.5 & generate 5 frequent patterns and
rules
Results:
1) Open notepad
2) Enter the dataset as follows: