DWM Exp5 A49

Uploaded by

gharatsoham28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

DWM Exp5 A49

Uploaded by

gharatsoham28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

PART A

(PART A: TO BE REFFERED BY STUDENTS)

Experiment No.05

Aim: Implementation of Association Rule Mining Algorithm (Apriori) using Weka Tool

Outcome:
After successful completion of this experiment students will be able to
1. Demonstrate an understanding of the importance of data mining
2. Organize and Prepare the data needed for data mining using pre preprocessing techniques
3. Perform exploratory analysis of the data to be used for mining.
4. Implement the appropriate data mining methods like Frequent Pattern mining on large data sets.

Theory:
Apriori Algorithm
Apriori algorithm refers to the algorithm which is used to calculate the association rules between objects. It means
how two or more objects are related to one another. In other words, we can say that the apriori algorithm is an
association rule leaning that analyzes that people who bought product A also bought product B.

How the Apriori algorithm works

Each of the key steps in the Apriori algorithm looks to identify item sets and all their possible supersets looking
for the most frequent to create the association rules.
Step 1: Frequent item sets generation
The algorithm first identifies the unique items, sometimes referred to as 1-itemsets, in the dataset along with their
frequencies. Then, it combines the items that appear together with a probability above a specified threshold into
candidate item sets and filters out the infrequent item sets to reduce the compute cost in further steps. This process,
known as frequent itemset mining, looks just for item sets with meaningful frequencies.
Step 2: Expand and then prune item sets
Using the Apriori property, the algorithm combines frequent item sets further to form larger item sets. The larger
itemset combinations with a lower probability are pruned. This further reduces the search space and makes the
computation more efficient.
Step 3: Repeat steps 1 and 2
The algorithm repeats steps 1 and 2 until all frequent item sets meeting the defined threshold probability are
generated exhaustively. Each iteration generates more complex and comprehensive associations in the item sets.
Once Apriori has created the item sets the strength of the generated associations and relationships can be
investigated.

Measuring item sets

The Apriori algorithm uses the support, confidence, and lift metrics to define its operating criteria and improve
performance efficiency.

Support
Support is defined as the ratio of the number of times an item occurs in the transactions to the total number of
transactions. This metric thus defines the probability of the occurrence of each individual item in the transactions.
The same logic can be extended to item sets.
S(IA)=Occ(IA)/Total Transactions
where IA is item A, Occ(IA) is the number of occurrences of item A, and S(IA) = support of item A
For example, in a retail store, 250 out of 2000 transactions over a day might include a purchase of apples. Using
the formula:
S(IApples)=250/2000=0.125
This result implies there is a 12.5% chance that apples were bought that day.
You can indicate a required minimum support threshold when applying the Apriori algorithm. This means that
any item or itemset with support less than the specified minimum support will be considered infrequent.

Confidence
The confidence metric identifies the probability of items or item sets occurring in the item sets together. For
example, if there are two items in a transaction, the existence of one item is assumed to lead to the other. The first
item or itemset is the antecedent, and the second is the consequent.
The confidence is thus defined as the ratio of the number of transactions having both the antecedent and the
consequent, to the number of transactions only having the antecedent. This scenario is represented as:
C(A,B)=Occ(A∩B)/Occ(A)
where A is the antecedent, B is the consequent, and C(A,B) is the confidence that A leads to B.
Extending the preceding example, assume that there are 150 transactions where apples and bananas were
purchased together. The confidence is calculated as:
C(Apples,Bananas)=150/250=0.6
This result indicates a 60% chance that an apple purchase then leads to a banana purchase. Similarly, assuming a
total of 500 transactions for bananas, then the confidence that a banana purchase leads to an apple purchase is
calculated as:
C(Bananas,Apples)=150/500=0.3
Here, there is just a 30% chance that a banana purchase leads to an apple purchase.
While confidence is a good measure of likelihood, it is not a guarantee of a clear association between items. The
value of confidence might be high for other reasons. For this reason, a minimum confidence threshold is applied
to filter out weakly probable associations while mining with association rules.

Lift
Lift is the factor with which the likelihood of item A leading to item B is higher than the likelihood of item A.
This metric quantifies the strength of association between A and B. It can help indicate whether there is a real
relationship between the items in the itemset or are they being grouped together by coincidence.
L(A,B)=C(A,B)/S(A)
Where LA,B is the lift for item A leading to item B, CA,B is the confidence that item A leads to item B, SA is the
support for item A.
For the example above, we can see that:
L(Apples,Bananas)=0.6/0.125=4.8
The high lift value indicates that the likelihood of apples and bananas being purchased together is 4.8 times higher
than that of apples being purchased alone. Also, it can be observed that:
L(Bananas,Apples)=0.3/0.25=1.2
The low lift value here indicates that a banana purchase leading to an apple purchase might be just a coincidence.
PART B
(PART B: TO BE COMPLETED BY STUDENTS)

Roll. No: A49 Name: Soham B. Gharat

Class: TE AI & DS Batch: A3
Date of Experiment: 16/08/2024 Date of Submission: 30/08/2024
Grade:

Input and Output:

Observations and learning:
The experiment highlights the algorithm's efficiency in identifying frequent itemsets, though it can be
computationally intensive with large datasets. Interpreting the rules is straightforward, making Apriori
valuable for applications like market basket analysis. However, it's important to manage rule redundancy and
consider additional metrics like lift to assess rule significance effectively. Apriori algorithm using Weka
reveals the importance of tuning support and confidence thresholds to generate meaningful association rules.
Lower thresholds can lead to many rules, some of which may be less useful, while higher thresholds produce
fewer but stronger rules.
Conclusion:
In conclusion, the implementation of the Apriori algorithm using the Weka tool demonstrates its effectiveness
in uncovering frequent itemsets and generating association rules from transactional data. Although Apriori
is straightforward and interpretable, it can become computationally intensive with large datasets or low
support thresholds.

Question of Curiosity:

Q.1] What is the purpose of the Apriori algorithm in association rule mining and how does it work in Weka?
Ans: The Apriori algorithm is used in association rule mining to discover frequent itemsets within a dataset
and generate rules that describe relationships between these items. Its primary goal is to identify patterns of
items that frequently occur together, which can be useful for market basket analysis, recommendation
systems, and other data-driven decision-making processes. The following are the steps to make it work:
1. Loading Data: In Weka, you begin by loading a transactional dataset into the "Explorer" mode, typically
in ARFF format.
2. Selecting the Algorithm: Navigate to the "Associate" tab and choose "Apriori" from the "Choose" button
in the "Associator" section.
3. Configuring Parameters: Set parameters such as "lower bound minSupport" (minimum support
threshold) and "minMetric" (minimum confidence for rules). These parameters control the frequency
of itemsets and the reliability of the rules generated.
4. Running the Algorithm: Click "Start" to execute the algorithm. Weka will process the data to identify
frequent itemsets and generate association rules based on the specified thresholds.
5. Interpreting Results: After execution, Weka provides output including frequent itemsets and associated
rules, displaying metrics like support and confidence. This helps in understanding which items
frequently occur together and the strength of their relationships.
The Apriori algorithm works by iteratively finding frequent itemsets, pruning those that do not meet the
support threshold, and then generating rules based on the remaining itemsets, which are evaluated for
confidence.

Q.2] What are some common challenges when using the Apriori algorithm in Weka and how can they be
addressed?
Ans: The Common Challenges and Solutions with the Apriori Algorithm in Weka are as follows:

1. High Computational Cost:

• Challenge: The Apriori algorithm can become slow and resource-intensive, especially with large
datasets or very low support thresholds, leading to a combinatorial explosion of itemsets.
• Solution: Increase the minimum support threshold to reduce the number of itemsets generated.
Alternatively, consider using more efficient algorithms like FP-Growth, which is designed to
handle large datasets more efficiently.
2. Rule Redundancy:
• Challenge: Apriori may generate numerous redundant or similar rules, which can be
overwhelming and difficult to interpret.
• Solution: Post-process the results by filtering out redundant rules or grouping similar rules
together. Use Weka’s rule filtering options or manually consolidate rules based on their practical
significance.
3. Overfitting:
• Challenge: Low support and confidence thresholds can lead to the discovery of too many rules,
some of which may be trivial or overfitted.
• Solution: Adjust support and confidence thresholds to find a balance between rule quantity and
quality. Validate the rules by assessing their practical relevance and usefulness.
4. Data Sparsity:
• Challenge: Sparse datasets with many items but few transactions can result in many itemsets
with low support, making it hard to identify meaningful patterns.
• Solution: Preprocess the data to combine or bin items, increase the minimum support threshold,
or apply dimensionality reduction techniques to improve the density of itemsets.
5. Memory Usage:
• Challenge: The algorithm may consume a lot of memory due to the large number of itemsets
and rules generated.
• Solution: Monitor memory usage and adjust parameters to control the number of itemsets and
rules. Using a smaller subset of the data for initial exploration can also help manage memory
requirements.

Q.3] How can the results of the Apriori algorithm be interpreted and applied in a real-world scenario?
Ans:
The results of the Apriori algorithm can be interpreted in the following ways:
1. Frequent Itemsets: These are sets of items that appear together in transactions above a specified support
threshold. They indicate which items are commonly purchased or used together.
2. Association Rules: These rules describe relationships between items, such as "If item A is purchased,
then item B is likely to be purchased." Rules are evaluated using metrics like support (how often items
appear together), confidence (the likelihood of an item being purchased given the presence of another
item), and lift (the strength of the rule compared to random chance).

There are different ways of applying Results in Real-World Scenarios which are mentioned as follows:

1. Market Basket Analysis: Retailers can use the results to identify which products are frequently bought
together. This information helps in designing effective store layouts, creating bundled offers, and
targeting promotions to increase sales.
2. Recommendation Systems: Online retailers and streaming services can use association rules to
recommend products or content based on user behavior patterns, improving customer experience and
engagement.
3. Inventory Management: By understanding which items are often purchased together, businesses can
optimize inventory levels and placement, reducing stockouts and overstock situations.
4. Cross-Selling Opportunities: Businesses can identify opportunities for cross-selling by targeting
customers who purchase specific items with related products or services.
5. Customer Segmentation: Results can help segment customers based on their purchasing patterns,
allowing for more personalized marketing strategies and product recommendations.

****************************

Data Mining-Module Ii Notes (S4 Bca)
No ratings yet
Data Mining-Module Ii Notes (S4 Bca)
40 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
28 pages
Unit3 Data Mining Pattern
No ratings yet
Unit3 Data Mining Pattern
46 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Pattern Mining
No ratings yet
Pattern Mining
36 pages
Unit 4
No ratings yet
Unit 4
72 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
4 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Module 5.1 - Association Rule Mining, Apriori Algorithm, Data Mining, Support, Confidence, Examples
100% (1)
Module 5.1 - Association Rule Mining, Apriori Algorithm, Data Mining, Support, Confidence, Examples
108 pages
667a8d24bb947 PPT
No ratings yet
667a8d24bb947 PPT
24 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
17 pages
Data Mining Experiment 4
No ratings yet
Data Mining Experiment 4
12 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Apriori
No ratings yet
Apriori
13 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
UNIT III
No ratings yet
UNIT III
13 pages
DWMExp 7
No ratings yet
DWMExp 7
3 pages
Feature Extraction and Reduction by Using ModifiedApriori Algorithm
No ratings yet
Feature Extraction and Reduction by Using ModifiedApriori Algorithm
9 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Module-4 DM - Introduction
No ratings yet
Module-4 DM - Introduction
5 pages
Folklore - An Encyclopedia of Beliefs, Customs, Tales, Music and Art (Gnv64)
100% (1)
Folklore - An Encyclopedia of Beliefs, Customs, Tales, Music and Art (Gnv64)
930 pages
Explain Architecture of Data Mining
No ratings yet
Explain Architecture of Data Mining
12 pages
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
No ratings yet
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
81 pages
U2 - Apriori - 5th Sem - DS
No ratings yet
U2 - Apriori - 5th Sem - DS
12 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
Lecture - 11 - Sathya - Zainab
No ratings yet
Lecture - 11 - Sathya - Zainab
17 pages
Unit - 5 Machine Learning
No ratings yet
Unit - 5 Machine Learning
72 pages
CSA 106 Market Basket Analysis
No ratings yet
CSA 106 Market Basket Analysis
13 pages
Unit-4 Da
No ratings yet
Unit-4 Da
15 pages
Aml Unit 3
No ratings yet
Aml Unit 3
17 pages
Devdm
No ratings yet
Devdm
7 pages
Introduction To The Apriori Algorithm
No ratings yet
Introduction To The Apriori Algorithm
10 pages
Ferrero Rocher Cake Recipe Booklet
100% (1)
Ferrero Rocher Cake Recipe Booklet
13 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
DWDM Unit 4 (R22)
No ratings yet
DWDM Unit 4 (R22)
25 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Marketbasket Analysis
No ratings yet
Marketbasket Analysis
28 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Chennai, Bangalore and Hyderabad
50% (2)
Chennai, Bangalore and Hyderabad
52 pages
Contents
No ratings yet
Contents
59 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
9 pages
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
No ratings yet
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
9 pages
11 Association Rules Mining and Recommendation Systems
No ratings yet
11 Association Rules Mining and Recommendation Systems
70 pages
Final Evolution of Indian Cinema
No ratings yet
Final Evolution of Indian Cinema
36 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Ariori Introduction and Concept
No ratings yet
Ariori Introduction and Concept
37 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Grading Capacitors PDF
No ratings yet
Grading Capacitors PDF
4 pages
Lecture 8
No ratings yet
Lecture 8
13 pages
Ex 9 DWM Aryant
No ratings yet
Ex 9 DWM Aryant
9 pages
7-S Framework of McKinsey
No ratings yet
7-S Framework of McKinsey
13 pages
Design of Pedestal
No ratings yet
Design of Pedestal
4 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Apriori
No ratings yet
Apriori
34 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
Oil Well Drilling Problems Presentation
100% (1)
Oil Well Drilling Problems Presentation
28 pages
Position Paper
No ratings yet
Position Paper
3 pages
Landry v. Cooper, 10th Cir. (1999)
No ratings yet
Landry v. Cooper, 10th Cir. (1999)
4 pages
Poverty and Mental Health Final
No ratings yet
Poverty and Mental Health Final
25 pages
Fine Dining Lovers
No ratings yet
Fine Dining Lovers
3 pages
DocScanner Oct 8, 2024 4-54 PM
No ratings yet
DocScanner Oct 8, 2024 4-54 PM
39 pages
DWM Exp4 A49
No ratings yet
DWM Exp4 A49
11 pages
A49 WC Exp 6
No ratings yet
A49 WC Exp 6
9 pages
Senior Two Notes - Sculpture in The Round
No ratings yet
Senior Two Notes - Sculpture in The Round
5 pages
K004en-Nw - NIPPON STEEL PDF
No ratings yet
K004en-Nw - NIPPON STEEL PDF
57 pages
Cookery 9-12 - Dmya at
No ratings yet
Cookery 9-12 - Dmya at
9 pages
SMMA Contract Template757-1
No ratings yet
SMMA Contract Template757-1
7 pages
DWM Exp6 A49
No ratings yet
DWM Exp6 A49
7 pages
Astute-Class Submarine - Wikipedia
No ratings yet
Astute-Class Submarine - Wikipedia
9 pages
2025 1 B
No ratings yet
2025 1 B
7 pages
Unit 4 Grammar Summary
No ratings yet
Unit 4 Grammar Summary
14 pages
VSB5 Draft Public Comment
No ratings yet
VSB5 Draft Public Comment
27 pages
Criminal Negligence-Article 365 of The RPC
No ratings yet
Criminal Negligence-Article 365 of The RPC
12 pages
Cariology
No ratings yet
Cariology
2 pages
The Consumer Decision Journey
No ratings yet
The Consumer Decision Journey
8 pages
Iat KT TT Students Sh24
No ratings yet
Iat KT TT Students Sh24
1 page
Aeroshell LGF
No ratings yet
Aeroshell LGF
3 pages
Ghulam Khan
No ratings yet
Ghulam Khan
3 pages
Composing A Negative Message: International University - Vietnam National University
No ratings yet
Composing A Negative Message: International University - Vietnam National University
3 pages
2015 Paper Source Colorscope
No ratings yet
2015 Paper Source Colorscope
1 page
Berio Seq1
100% (1)
Berio Seq1
9 pages
Pre-Installed SAP Portable Hard Drive Plug N Play For Laptop and Desktops
No ratings yet
Pre-Installed SAP Portable Hard Drive Plug N Play For Laptop and Desktops
23 pages
Classroom Rules For Online Learning
No ratings yet
Classroom Rules For Online Learning
1 page
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

DWM Exp5 A49

Uploaded by

DWM Exp5 A49

Uploaded by

PART A

(PART A: TO BE REFFERED BY STUDENTS)

How the Apriori algorithm works

Measuring item sets

Roll. No: A49 Name: Soham B. Gharat

Input and Output:

1. High Computational Cost:

You might also like