Student Name: Srijan Dawn University Roll No: 11700222095 University Registration No: 221170110353 3 Year, 6 Semester Program Name
Student Name: Srijan Dawn University Roll No: 11700222095 University Registration No: 221170110353 3 Year, 6 Semester Program Name
2. Introduction :
Data mining is a crucial field in computer science
that enables organizations to discover hidden
patterns in large datasets. The Apriori algorithm,
introduced by Rakesh Agrawal and Ramakrishnan
Srikant in 1994, is one of the most widely used
algorithms for discovering association rules. It
operates on the principle of identifying frequent
itemsets in a transactional database and generating
strong association rules. This report provides an
in-depth exploration of the Apriori algorithm’s
working mechanism and its applications.
3. Working Principle of Apriori
Algorithm :
The Apriori algorithm is based on the Apriori
property, which states that "if an itemset is
frequent, then all of its subsets must also be
frequent." The algorithm follows these steps:
i. Set Minimum Support and Confidence:
Define minimum support and confidence
thresholds to filter out less significant rules.
ii. Generate Frequent Itemsets:
a. Scan the dataset to determine the
frequency of individual items.
b. Generate candidate itemsets by combining
frequent items from the previous step.
c. Prune candidate itemsets using the
Apriori property.
d. Repeat until no more frequent itemsets are
found.
iii. Generate Association Rules:
a. Use frequent itemsets to generate
association rules.
b. Compute confidence for each rule.
c. Retain rules that meet the confidence
threshold.
4. Example :
Consider a transaction dataset –
Transaction ID Items Purchased
T1 Milk, Bread, Butter
T2 Bread, Butter
T3 Milk, Bread
T4 Milk, Butter
T5 Milk, Bread, Butter
Step 1: Generate Frequent 1-itemsets
{Milk}: 4 occurrences
{Bread}: 4 occurrences
{Butter}: 3 occurrences
Step 2: Generate Frequent 2-itemsets
{Milk, Bread}: 3 occurrences
{Milk, Butter}: 3 occurrences
{Bread, Butter}: 3 occurrences
Step 3: Generate Frequent 3-itemsets
{Milk, Bread, Butter}: 2 occurrences
Step 4: Generate Association Rules
Rule: {Milk} → {Bread} (Support = 3/5,
Confidence = 3/4)
Rule: {Bread, Butter} → {Milk} (Support = 2/5,
Confidence = 2/3)
Only rules meeting the minimum support and
confidence thresholds are retained.
5. Application :
a) Market Basket Analysis: Identifying
frequently bought items together to improve
sales strategies.
b) Fraud Detection: Discovering unusual
patterns in financial transactions.
c) Medical Diagnosis: Finding associations
between symptoms and diseases.
d) Recommendation Systems: Providing
personalized suggestions based on user
preferences.
e) Web Usage Mining: Analyzing browsing
behavior to enhance user experience.
6. Conclusion :
The Apriori algorithm is a powerful technique in
data mining for discovering frequent itemsets and
association rules. By leveraging the Apriori
property, it efficiently reduces computational
complexity and improves pattern recognition. Its
applications extend across various domains,
including retail, healthcare, and finance, making it
an indispensable tool for extracting valuable
insights from large datasets.
7. References:
Han, J., Kamber, M., & Pei, J. (2011). Data Mining:
Concepts and Techniques. Elsevier.
https://www.geeksforgeeks.org/apriori-algorithm/
https://en.wikipedia.org/wiki/Apriori_algorithm