Ex 1
Ex 1
Aim:
The aim of implementing the Apriori algorithm in data mining is to discover frequent itemsets
and extract meaningful association rules from transactional data. This process helps in
identifying correlations and patterns among items purchased together, which can be valuable for
various applications such as market basket analysis, recommendation systems, and more.
Procedure:
Apriori algorithm is useful for discovering patterns and relationships between items in large
datasets, such as market basket analysis.
1. Data Preprocessing:
o Data Collection: Obtain transactional data where each transaction consists of a
set of items.
o Data Cleaning: Handle missing values, remove duplicates, and ensure the data is
in the appropriate format for analysis.
2. Generate Candidate Itemsets:
o Step 1: Define Minimum Support: Set a minimum support threshold (e.g., 1%,
5%) that determines the minimum frequency an itemset must appear in the dataset
to be considered frequent.
o Step 2: Generate Candidate Itemsets: Initially, generate candidate itemsets of
length 1 (single items) and calculate their support (frequency of occurrence).
3. Iterative Frequent Itemset Generation (Apriori Principle):
o Step 3: Prune Non-Frequent Itemsets: Eliminate candidate itemsets that do not
meet the minimum support threshold.
o Step 4: Generate Larger Itemsets: Use frequent itemsets from the previous step
to generate candidate itemsets of larger sizes (e.g., length 2, 3, etc.).
o Step 5: Repeat: Continue the process iteratively until no new frequent itemsets
can be generated.
4. Extract Association Rules:
o Step 6: Define Minimum Confidence: Specify a minimum confidence threshold
(e.g., 50%, 70%) that determines the strength of association rules to be extracted.
o Step 7: Generate Association Rules: Use the frequent itemsets to generate
association rules that meet the specified confidence threshold.
o Step 8: Evaluate and Rank Rules: Evaluate the extracted rules based on metrics
like confidence, support, and lift to identify the most interesting and actionable
rules.
5. Interpretation and Visualization:
o Step 9: Interpret Results: Analyze and interpret the discovered association rules
to understand relationships between items.
o Step 10: Visualize Results: Use plots, graphs, or tables to visualize important
rules and patterns for easier interpretation and presentation.
6. Implementation in R:
o Use R programming language with libraries such as arules to implement the
Apriori algorithm, generate frequent itemsets, and extract association rules as
demonstrated earlier.
CODE:
# Install and load necessary packages
if (!requireNamespace("arules", quietly = TRUE)) {
install.packages("arules")
}
library(arules)
# Step 1: Define transaction data
transactions <- list(
c("bread", "milk", "diapers"),
c("bread", "coco"),
c("milk", "diapers", "coco", "eggs"),
c("bread", "milk", "diapers", "coco"),
c("bread", "milk", "diapers", "eggs")
)
# Step 2: Convert transactions to transactions format recognized by arules package
trans <- as(transactions, "transactions")
# Step 3: Save transaction data to an R file
save(trans, file = "transaction_data.RData")
# Step 4: Clear current workspace
rm(list = ls())
# Step 5: Load transaction data from the saved R file
load("transaction_data.RData")
# Step 6: Generate frequent itemsets using Apriori algorithm
frequent_itemsets <- apriori(trans, parameter = list(support = 0.2, confidence = 0.6))
# Step 7: Extract association rules
association_rules <- as(frequent_itemsets, "rules")
# Step 8: Display the association rules
inspect(association_rules)
OUTPUT
Apriori
Parameter specification:
Confidence minval smax arem aval originalSupport maxtime support minlen
0.6 0.1 1 none FALSE TRUE 5 0.2 1
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
OUTPUT EXPLANATION
Parameter Specification:
• confidence: Minimum confidence level for the rules generated is set to 0.6 (60%).
• minval, smax, arem, aval: These parameters are specific to the algorithmic control and
typically affect how rules are generated or filtered, though specific details aren't provided
in your snippet.
• originalSupport: Indicates whether to use original support computation.
• maxtime: Maximum runtime for the algorithm set to 5 units.
• support: Minimum support level for frequent itemsets is set to 0.2 (20%).
• minlen: Minimum length of the itemsets considered in generating rules is 1.
• maxlen: Maximum length of the itemsets considered in generating rules is 10.
• target: Rules are the target of the algorithm.
• ext: Not explicitly defined in the snippet.