0% found this document useful (0 votes)
12 views8 pages

Ex 1

yes

Uploaded by

ranjaniy256
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

Ex 1

yes

Uploaded by

ranjaniy256
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

EX1: Implement Apriori algorithm to extract association rule of datamining.

Aim:
The aim of implementing the Apriori algorithm in data mining is to discover frequent itemsets
and extract meaningful association rules from transactional data. This process helps in
identifying correlations and patterns among items purchased together, which can be valuable for
various applications such as market basket analysis, recommendation systems, and more.
Procedure:
Apriori algorithm is useful for discovering patterns and relationships between items in large
datasets, such as market basket analysis.
1. Data Preprocessing:
o Data Collection: Obtain transactional data where each transaction consists of a
set of items.
o Data Cleaning: Handle missing values, remove duplicates, and ensure the data is
in the appropriate format for analysis.
2. Generate Candidate Itemsets:
o Step 1: Define Minimum Support: Set a minimum support threshold (e.g., 1%,
5%) that determines the minimum frequency an itemset must appear in the dataset
to be considered frequent.
o Step 2: Generate Candidate Itemsets: Initially, generate candidate itemsets of
length 1 (single items) and calculate their support (frequency of occurrence).
3. Iterative Frequent Itemset Generation (Apriori Principle):
o Step 3: Prune Non-Frequent Itemsets: Eliminate candidate itemsets that do not
meet the minimum support threshold.
o Step 4: Generate Larger Itemsets: Use frequent itemsets from the previous step
to generate candidate itemsets of larger sizes (e.g., length 2, 3, etc.).
o Step 5: Repeat: Continue the process iteratively until no new frequent itemsets
can be generated.
4. Extract Association Rules:
o Step 6: Define Minimum Confidence: Specify a minimum confidence threshold
(e.g., 50%, 70%) that determines the strength of association rules to be extracted.
o Step 7: Generate Association Rules: Use the frequent itemsets to generate
association rules that meet the specified confidence threshold.
o Step 8: Evaluate and Rank Rules: Evaluate the extracted rules based on metrics
like confidence, support, and lift to identify the most interesting and actionable
rules.
5. Interpretation and Visualization:
o Step 9: Interpret Results: Analyze and interpret the discovered association rules
to understand relationships between items.
o Step 10: Visualize Results: Use plots, graphs, or tables to visualize important
rules and patterns for easier interpretation and presentation.
6. Implementation in R:
o Use R programming language with libraries such as arules to implement the
Apriori algorithm, generate frequent itemsets, and extract association rules as
demonstrated earlier.

CODE:
# Install and load necessary packages
if (!requireNamespace("arules", quietly = TRUE)) {
install.packages("arules")
}
library(arules)
# Step 1: Define transaction data
transactions <- list(
c("bread", "milk", "diapers"),
c("bread", "coco"),
c("milk", "diapers", "coco", "eggs"),
c("bread", "milk", "diapers", "coco"),
c("bread", "milk", "diapers", "eggs")
)
# Step 2: Convert transactions to transactions format recognized by arules package
trans <- as(transactions, "transactions")
# Step 3: Save transaction data to an R file
save(trans, file = "transaction_data.RData")
# Step 4: Clear current workspace
rm(list = ls())
# Step 5: Load transaction data from the saved R file
load("transaction_data.RData")
# Step 6: Generate frequent itemsets using Apriori algorithm
frequent_itemsets <- apriori(trans, parameter = list(support = 0.2, confidence = 0.6))
# Step 7: Extract association rules
association_rules <- as(frequent_itemsets, "rules")
# Step 8: Display the association rules
inspect(association_rules)

OUTPUT

Apriori

Parameter specification:
Confidence minval smax arem aval originalSupport maxtime support minlen
0.6 0.1 1 none FALSE TRUE 5 0.2 1
maxlen target ext
10 rules TRUE

Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE

Absolute minimum support count: 1


set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[5 item(s), 5 transaction(s)] done [0.00s].
sorting and recoding items ... [5 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [32 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
>
> # Step 7: Extract association rules
> association_rules <- as(frequent_itemsets, "rules")
>
> # Step 8: Display the association rules
> inspect(association_rules)
lhs rhs support confidence coverage lift
[1] {} => {coco} 0.6 0.6000000 1.0 1.0000000
[2] {} => {bread} 0.8 0.8000000 1.0 1.0000000
[3] {} => {diapers} 0.8 0.8000000 1.0 1.0000000
[4] {} => {milk} 0.8 0.8000000 1.0 1.0000000
[5] {eggs} => {diapers} 0.4 1.0000000 0.4 1.2500000
[6] {eggs} => {milk} 0.4 1.0000000 0.4 1.2500000
[7] {coco} => {bread} 0.4 0.6666667 0.6 0.8333333
[8] {coco} => {diapers} 0.4 0.6666667 0.6 0.8333333
[9] {coco} => {milk} 0.4 0.6666667 0.6 0.8333333
[10] {bread} => {diapers} 0.6 0.7500000 0.8 0.9375000
[11] {diapers} => {bread} 0.6 0.7500000 0.8 0.9375000
[12] {bread} => {milk} 0.6 0.7500000 0.8 0.9375000
[13] {milk} => {bread} 0.6 0.7500000 0.8 0.9375000
[14] {diapers} => {milk} 0.8 1.0000000 0.8 1.2500000
[15] {milk} => {diapers} 0.8 1.0000000 0.8 1.2500000
[16] {coco, eggs} => {diapers} 0.2 1.0000000 0.2 1.2500000
[17] {coco, eggs} => {milk} 0.2 1.0000000 0.2 1.2500000
[18] {bread, eggs} => {diapers} 0.2 1.0000000 0.2 1.2500000
[19] {bread, eggs} => {milk} 0.2 1.0000000 0.2 1.2500000
[20] {diapers, eggs} => {milk} 0.4 1.0000000 0.4 1.2500000
[21] {eggs, milk} => {diapers} 0.4 1.0000000 0.4 1.2500000
[22] {coco, diapers} => {milk} 0.4 1.0000000 0.4 1.2500000
[23] {coco, milk} => {diapers} 0.4 1.0000000 0.4 1.2500000
[24] {bread, diapers} => {milk} 0.6 1.0000000 0.6 1.2500000
[25] {bread, milk} => {diapers} 0.6 1.0000000 0.6 1.2500000
[26] {diapers, milk} => {bread} 0.6 0.7500000 0.8 0.9375000
[27] {coco, diapers, eggs} => {milk} 0.2 1.0000000 0.2 1.2500000
[28] {coco, eggs, milk} => {diapers} 0.2 1.0000000 0.2 1.2500000
[29] {bread, diapers, eggs} => {milk} 0.2 1.0000000 0.2 1.2500000
[30] {bread, eggs, milk} => {diapers} 0.2 1.0000000 0.2 1.2500000
[31] {coco, bread, diapers} => {milk} 0.2 1.0000000 0.2 1.2500000
[32] {coco, bread, milk} => {diapers} 0.2 1.0000000 0.2 1.2500000
count
[1] 3
[2] 4
[3] 4
[4] 4
[5] 2
[6] 2
[7] 2
[8] 2
[9] 2
[10] 3
[11] 3
[12] 3
[13] 3
[14] 4
[15] 4
[16] 1
[17] 1
[18] 1
[19] 1
[20] 2
[21] 2
[22] 2
[23] 2
[24] 3
[25] 3
[26] 3
[27] 1
[28] 1
[29] 1
[30] 1
[31] 1
[32] 1

OUTPUT EXPLANATION
Parameter Specification:
• confidence: Minimum confidence level for the rules generated is set to 0.6 (60%).
• minval, smax, arem, aval: These parameters are specific to the algorithmic control and
typically affect how rules are generated or filtered, though specific details aren't provided
in your snippet.
• originalSupport: Indicates whether to use original support computation.
• maxtime: Maximum runtime for the algorithm set to 5 units.
• support: Minimum support level for frequent itemsets is set to 0.2 (20%).
• minlen: Minimum length of the itemsets considered in generating rules is 1.
• maxlen: Maximum length of the itemsets considered in generating rules is 10.
• target: Rules are the target of the algorithm.
• ext: Not explicitly defined in the snippet.

Association Rules Output:


The association rules output shows each discovered rule along with several metrics:
• lhs: Left-hand side of the rule, indicating the antecedent (items before the =>).
• rhs: Right-hand side of the rule, indicating the consequent (items after the =>).
• support: The proportion of transactions that contain both the antecedent and consequent
itemsets.
• confidence: The likelihood that the consequent itemset is purchased given that the
antecedent itemset is purchased.
• coverage: The proportion of transactions that contain the antecedent itemset.
• lift: The ratio of observed support to expected support if the antecedent and consequent
were independent.
• count: The number of transactions that contain both the antecedent (lhs) and consequent
(rhs)itemsets.
Example Interpretation:
For instance, let's interpret the first few rules from the output:
1. {coco} => {bread}:
o Support: 0.4 (appears in 40% of transactions)
o Confidence: 0.67 (67% of transactions that contain coco also contain bread)
o Lift: 0.83 (coco and bread are less likely to be bought together than expected if
independent)
2. {eggs} => {diapers}:
o Support: 0.4 (appears in 40% of transactions)
o Confidence: 1.0 (100% of transactions that contain eggs also contain diapers)
o Lift: 1.25 (eggs and diapers are more likely to be bought together than expected if
independent)
3. {bread, milk} => {diapers}:
o Support: 0.6 (appears in 60% of transactions)
o Confidence: 1.0 (100% of transactions that contain bread and milk also contain
diapers)
o Lift: 1.25 (bread, milk, and diapers are more likely to be bought together than
expected if independent)
Support measures how frequently a set of items (itemset) appears together in the dataset. It
indicates the proportion of transactions in the dataset that contain the specific itemset.
Confidence measures the reliability or certainty of the inference made by a rule. It indicates how
likely item Y is purchased when item X is purchased, expressed as a conditional probability.
Example:
Consider the following association rule:
• {bread, milk} => {diapers}
o Support: 0.6 (60% of transactions contain both bread and milk)
o Confidence: 1.0 (100% of transactions containing bread and milk also contain
diapers)

You might also like