0% found this document useful (0 votes)

139 views

Apriori Algorithm in Data Mining

The document discusses the Apriori algorithm for frequent itemset mining. It begins by defining key concepts like itemsets, frequent itemsets, and minimum support. It then explains the steps of the Apriori algorithm, which iteratively generates candidate itemsets and prunes those that do not meet minimum support. An example is provided to illustrate the algorithm's join and prune steps. The document aims to provide an overview of how the Apriori algorithm can be used to discover frequent patterns and association rules in transactional datasets.

Uploaded by

Nair Shreeja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views

Apriori Algorithm in Data Mining

Uploaded by

Nair Shreeja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Apriori Algorithm In Data Mining: Implementation

With Examples

There are several methods for Data Mining such as association, correlation, classification &
clustering.

This tutorial primarily focuses on mining using association rules. By association rules, we identify
the set of items or attributes that occur together in a table.

What You Will Learn: [hide]

 What Is An Itemset?
o What Is A Frequent Itemset?
o Frequent Pattern Mining (FPM)
o Association Rules
 Why Frequent Itemset Mining?
o Apriori Algorithm – Frequent Pattern Algorithms
 Steps In Apriori
 Advantages
 Disadvantages
 Methods To Improve Apriori Efficiency
 Applications Of Apriori Algorithm
 Conclusion
o Recommended Reading

What Is An Itemset?
A set of items together is called an itemset. If any itemset has k-items it is called a k-itemset. An
itemset consists of two or more items. An itemset that occurs frequently is called a frequent
itemset. Thus frequent itemset mining is a data mining technique to identify the items that often
occur together.
For Example, Bread and butter, Laptop and Antivirus software, etc.

What Is A Frequent Itemset?

A set of items is called frequent if it satisfies a minimum threshold value for support and confidence.
Support shows transactions with items purchased together in a single transaction. Confidence shows
transactions where the items are purchased one after the other.

For frequent itemset mining method, we consider only those transactions which meet minimum
threshold support and confidence requirements. Insights from these mining algorithms offer a lot of
benefits, cost-cutting and improved competitive advantage.

There is a tradeoff time taken to mine data and the volume of data for frequent mining. The frequent
mining algorithm is an efficient algorithm to mine the hidden patterns of itemsets within a short time
and less memory consumption.
Frequent Pattern Mining (FPM)
The frequent pattern mining algorithm is one of the most important techniques of data mining to
discover relationships between different items in a dataset. These relationships are represented in the
form of association rules. It helps to find the irregularities in data.

FPM has many applications in the field of data analysis, software bugs, cross-marketing, sale
campaign analysis, market basket analysis, etc.

Frequent itemsets discovered through Apriori have many applications in data mining tasks. Tasks
such as finding interesting patterns in the database, finding out sequence and Mining of association
rules is the most important of them.

Association rules apply to supermarket transaction data, that is, to examine the customer behavior in
terms of the purchased products. Association rules describe how often the items are purchased
together.

Association Rules
Association Rule Mining is defined as:
“Let I= { …} be a set of ‘n’ binary attributes called items. Let D= { ….} be set of transaction called
database. Each transaction in D has a unique transaction ID and contains a subset of the items in
I. A rule is defined as an implication of form X->Y where X, Y? I and X?Y=?. The set of items X
and Y are called antecedent and consequent of the rule respectively.”
Learning of Association rules is used to find relationships between attributes in large databases. An
association rule, A=> B, will be of the form” for a set of transactions, some value of itemset A
determines the values of itemset B under the condition in which minimum support and confidence
are met”.

Support and Confidence can be represented by the following example:

Bread=> butter [support=2%, confidence-60%]
The above statement is an example of an association rule. This means that there is a 2% transaction
that bought bread and butter together and there are 60% of customers who bought bread as well as
butter.

Support and Confidence for Itemset A and B are represented by formulas:

Association rule mining consists of 2 steps:
1. Find all the frequent itemsets.
2. Generate association rules from the above frequent itemsets.

Why Frequent Itemset Mining?

Frequent itemset or pattern mining is broadly used because of its wide applications in mining
association rules, correlations and graph patterns constraint that is based on frequent patterns,
sequential patterns, and many other data mining tasks.

Apriori Algorithm – Frequent Pattern Algorithms

Apriori algorithm was the first algorithm that was proposed for frequent itemset mining. It was later
improved by R Agarwal and R Srikant and came to be known as Apriori. This algorithm uses two
steps “join” and “prune” to reduce the search space. It is an iterative approach to discover the most
frequent itemsets.

Apriori says:
The probability that item I is not frequent is if:

 P(I) < minimum support threshold, then I is not frequent.

 P (I+A) < minimum support threshold, then I+A is not frequent, where A also belongs to
itemset.
 If an itemset set has value less than minimum support then all of its supersets will also fall
below min support, and thus can be ignored. This property is called the Antimonotone
property.
The steps followed in the Apriori Algorithm of data mining are:
1. Join Step: This step generates (K+1) itemset from K-itemsets by joining each item with
itself.
2. Prune Step: This step scans the count of each item in the database. If the candidate item does
not meet minimum support, then it is regarded as infrequent and thus it is removed. This step
is performed to reduce the size of the candidate itemsets.

Steps In Apriori
Apriori algorithm is a sequence of steps to be followed to find the most frequent itemset in the given
database. This data mining technique follows the join and the prune steps iteratively until the most
frequent itemset is achieved. A minimum support threshold is given in the problem or it is assumed
by the user.

#1) In the first iteration of the algorithm, each item is taken as a 1-itemsets candidate. The algorithm
will count the occurrences of each item.
#2) Let there be some minimum support, min_sup ( eg 2). The set of 1 – itemsets whose occurrence
is satisfying the min sup are determined. Only those candidates which count more than or equal to
min_sup, are taken ahead for the next iteration and the others are pruned.
#3) Next, 2-itemset frequent items with min_sup are discovered. For this in the join step, the 2-
itemset is generated by forming a group of 2 by combining items with itself.
#4) The 2-itemset candidates are pruned using min-sup threshold value. Now the table will have 2 –
itemsets with min-sup only.
#5) The next iteration will form 3 –itemsets using join and prune step. This iteration will follow
antimonotone property where the subsets of 3-itemsets, that is the 2 –itemset subsets of each group
fall in min_sup. If all 2-itemset subsets are frequent then the superset will be frequent otherwise it is
pruned.
#6) Next step will follow making 4-itemset by joining 3-itemset with itself and pruning if its subset
does not meet the min_sup criteria. The algorithm is stopped when the most frequent itemset is
achieved.

[image source]
Example of Apriori: Support threshold=50%, Confidence= 60%
TABLE-1
Transaction List of items

T1 I1,I2,I3

T2 I2,I3,I4

T3 I4,I5

T4 I1,I2,I4

T5 I1,I2,I3,I5

T6 I1,I2,I3,I4
Solution:
Support threshold=50% => 0.5*6= 3 => min_sup=3

1. Count Of Each Item

TABLE-2
Item Count

I1 4

I2 5

I3 4

I4 4

I5 2
2. Prune Step: TABLE -2 shows that I5 item does not meet min_sup=3, thus it is deleted, only I1,
I2, I3, I4 meet min_sup count.
TABLE-3
Item Count

I1 4

I2 5

I3 4

I4 4
3. Join Step: Form 2-itemset. From TABLE-1 find out the occurrences of 2-itemset.
TABLE-4
Item Count

I1,I2 4

I1,I3 3

I1,I4 2

I2,I3 4

I2,I4 3

I3,I4 2
4. Prune Step: TABLE -4 shows that item set {I1, I4} and {I3, I4} does not meet min_sup, thus it is
deleted.
TABLE-5
Item Count

I1,I2 4

I1,I3 3

I2,I3 4

I2,I4 3
5. Join and Prune Step: Form 3-itemset. From the TABLE- 1 find out occurrences of 3-itemset.
From TABLE-5, find out the 2-itemset subsets which support min_sup.
We can see for itemset {I1, I2, I3} subsets, {I1, I2}, {I1, I3}, {I2, I3} are occurring in TABLE-
5 thus {I1, I2, I3} is frequent.
We can see for itemset {I1, I2, I4} subsets, {I1, I2}, {I1, I4}, {I2, I4}, {I1, I4} is not frequent, as it is
not occurring in TABLE-5 thus {I1, I2, I4} is not frequent, hence it is deleted.
TABLE-6
Item

I1,I2,I3

I1,I2,I4

I1,I3,I4

I2,I3,I4
Only {I1, I2, I3} is frequent.
6. Generate Association Rules: From the frequent itemset discovered above the association could
be:
{I1, I2} => {I3}
Confidence = support {I1, I2, I3} / support {I1, I2} = (3/ 4)* 100 = 75%

{I1, I3} => {I2}

Confidence = support {I1, I2, I3} / support {I1, I3} = (3/ 3)* 100 = 100%

{I2, I3} => {I1}

Confidence = support {I1, I2, I3} / support {I2, I3} = (3/ 4)* 100 = 75%

{I1} => {I2, I3}

Confidence = support {I1, I2, I3} / support {I1} = (3/ 4)* 100 = 75%

{I2} => {I1, I3}

Confidence = support {I1, I2, I3} / support {I2 = (3/ 5)* 100 = 60%

{I3} => {I1, I2}

Confidence = support {I1, I2, I3} / support {I3} = (3/ 4)* 100 = 75%

This shows that all the above association rules are strong if minimum confidence threshold is 60%.

The Apriori Algorithm: Pseudo Code

C: Candidate item set of size k

L: Frequent itemset of size k

[image source]
Advantages
1. Easy to understand algorithm
2. Join and Prune steps are easy to implement on large itemsets in large databases
Disadvantages
1. It requires high computation if the itemsets are very large and the minimum support is kept
very low.
2. The entire database needs to be scanned.
Methods To Improve Apriori Efficiency
Many methods are available for improving the efficiency of the algorithm.
1. Hash-Based Technique: This method uses a hash-based structure called a hash table for
generating the k-itemsets and its corresponding count. It uses a hash function for generating
the table.
2. Transaction Reduction: This method reduces the number of transactions scanning in
iterations. The transactions which do not contain frequent items are marked or removed.
3. Partitioning: This method requires only two database scans to mine the frequent itemsets. It
says that for any itemset to be potentially frequent in the database, it should be frequent in at
least one of the partitions of the database.
4. Sampling: This method picks a random sample S from Database D and then searches for
frequent itemset in S. It may be possible to lose a global frequent itemset. This can be
reduced by lowering the min_sup.
5. Dynamic Itemset Counting: This technique can add new candidate itemsets at any marked
start point of the database during the scanning of the database.
Applications Of Apriori Algorithm
Some fields where Apriori is used:
1. In Education Field: Extracting association rules in data mining of admitted students through
characteristics and specialties.
2. In the Medical field: For example Analysis of the patient’s database.
3. In Forestry: Analysis of probability and intensity of forest fire with the forest fire data.
4. Apriori is used by many companies like Amazon in the Recommender System and by
Google for the auto-complete feature.
Conclusion
Apriori algorithm is an efficient algorithm that scans the database only once.

It reduces the size of the itemsets in the database considerably providing a good performance. Thus,
data mining helps consumers and industries better in the decision-making process.

Practical File: Internet Programming Lab
No ratings yet
Practical File: Internet Programming Lab
26 pages
Daa Question Bank Unit-3
No ratings yet
Daa Question Bank Unit-3
4 pages
Security in Computing - Chapter 2 Notes
100% (1)
Security in Computing - Chapter 2 Notes
14 pages
Interrupt Vectors and The Vector Table
100% (1)
Interrupt Vectors and The Vector Table
8 pages
TOC Assignment
No ratings yet
TOC Assignment
7 pages
Lec - 05 AAA - Brute Force and Exhaustive Search
No ratings yet
Lec - 05 AAA - Brute Force and Exhaustive Search
39 pages
Decision Tables Exercises
100% (1)
Decision Tables Exercises
3 pages
Design and Analysis of Algorithm: Lab File
No ratings yet
Design and Analysis of Algorithm: Lab File
58 pages
Theory of Computation - Question Bank
No ratings yet
Theory of Computation - Question Bank
19 pages
Pushdown Automata (PDA) : Reading: Chapter 6
No ratings yet
Pushdown Automata (PDA) : Reading: Chapter 6
37 pages
PPT
No ratings yet
PPT
17 pages
Solved Example K Nearest Neighbors Algorithm Weighted KNN To Classify New Instance by Mahesh Huddar
No ratings yet
Solved Example K Nearest Neighbors Algorithm Weighted KNN To Classify New Instance by Mahesh Huddar
6 pages
Ques. On Heap Sort & Spanning Tree
100% (1)
Ques. On Heap Sort & Spanning Tree
6 pages
Course File Compiler Design
No ratings yet
Course File Compiler Design
41 pages
Lec 17, 18 Activity Diagrams, Sequence Diagram
No ratings yet
Lec 17, 18 Activity Diagrams, Sequence Diagram
31 pages
4-5. Mathematical Analysis of Recursive and NonRecursive Techniques
No ratings yet
4-5. Mathematical Analysis of Recursive and NonRecursive Techniques
59 pages
Web Technologies Notes
No ratings yet
Web Technologies Notes
238 pages
CC File
No ratings yet
CC File
47 pages
Recoverability and Serializability
No ratings yet
Recoverability and Serializability
3 pages
Example 1: Simplify The Following Boolean Expression. Using Boolean Algebra Postulates and
No ratings yet
Example 1: Simplify The Following Boolean Expression. Using Boolean Algebra Postulates and
10 pages
Java Programs 1-10
No ratings yet
Java Programs 1-10
23 pages
Lesson 10
No ratings yet
Lesson 10
27 pages
Data Structure4
No ratings yet
Data Structure4
6 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
Chapter Three
No ratings yet
Chapter Three
37 pages
CH 3 - Regular Languages Amd Regular Grammars
No ratings yet
CH 3 - Regular Languages Amd Regular Grammars
67 pages
Chapter 5 - Security Mechanisms-Unlocked122
No ratings yet
Chapter 5 - Security Mechanisms-Unlocked122
8 pages
CS3391 OOPS UNIT 5 NOTES EduEngg
No ratings yet
CS3391 OOPS UNIT 5 NOTES EduEngg
34 pages
Enterprise Information Architecture Component Model - Chapter 5
100% (1)
Enterprise Information Architecture Component Model - Chapter 5
27 pages
Big Data Lec5
No ratings yet
Big Data Lec5
37 pages
Students Marks Analysis
No ratings yet
Students Marks Analysis
3 pages
Se Module 2 PPT
No ratings yet
Se Module 2 PPT
86 pages
Unit 4 Cloud Dr. Preeti Patil
100% (1)
Unit 4 Cloud Dr. Preeti Patil
81 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
49 pages
CD Question Bank Unit 1
No ratings yet
CD Question Bank Unit 1
6 pages
Unit-3 Oose
No ratings yet
Unit-3 Oose
81 pages
CD CT 3 2022 Paper
No ratings yet
CD CT 3 2022 Paper
4 pages
Theory of Automata 20 Most Important Questions
No ratings yet
Theory of Automata 20 Most Important Questions
3 pages
R18CSE4102-UNIT 1 Data Mining Notes
No ratings yet
R18CSE4102-UNIT 1 Data Mining Notes
26 pages
Big Data Nit067
No ratings yet
Big Data Nit067
1 page
Dbms Unit 4.2
No ratings yet
Dbms Unit 4.2
60 pages
C++ Exam Past Question and Answer
No ratings yet
C++ Exam Past Question and Answer
8 pages
Cloud Computing Security Testing
No ratings yet
Cloud Computing Security Testing
12 pages
Ai Notes
No ratings yet
Ai Notes
68 pages
Fundamentals of Algorithmic Problem Solving: B.B. Karki, LSU 2.1 CSC 3102
No ratings yet
Fundamentals of Algorithmic Problem Solving: B.B. Karki, LSU 2.1 CSC 3102
4 pages
BLOCK 2 Computer Graphics Ignou
No ratings yet
BLOCK 2 Computer Graphics Ignou
87 pages
CS-321 Software Engineering - Update
No ratings yet
CS-321 Software Engineering - Update
6 pages
Machine Learning - Unit - 1
100% (1)
Machine Learning - Unit - 1
58 pages
Syllabus For All Professional Electives Btech Programmes School of Computing 18 Reg
No ratings yet
Syllabus For All Professional Electives Btech Programmes School of Computing 18 Reg
241 pages
Bresenham Line Drawing Algorithm
No ratings yet
Bresenham Line Drawing Algorithm
8 pages
V Sem Solution Bank
100% (1)
V Sem Solution Bank
303 pages
Calculator Code
100% (1)
Calculator Code
7 pages
Algorithms and Data Structures: Dynamic Programming Matrix-Chain Multiplication
No ratings yet
Algorithms and Data Structures: Dynamic Programming Matrix-Chain Multiplication
17 pages
SRS Real Time Face Detection
No ratings yet
SRS Real Time Face Detection
15 pages
Cloud Computing Unit-1 Notes
No ratings yet
Cloud Computing Unit-1 Notes
12 pages
Lec 02 - Recursive Definition
No ratings yet
Lec 02 - Recursive Definition
33 pages
PHD Progress Report PPT 20191222-c
No ratings yet
PHD Progress Report PPT 20191222-c
36 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages