Apriori Algorithm in Word File
Apriori Algorithm in Word File
Note: All the contents of the images, including tables and calculations and codes have been
Introduction
as association, correlation, classification & clustering, this tutorial primarily focuses on learning
using association rules. By association rules, we identify the set of items or attributes that occur
together in a table[1].
The association rule learning is one of the very important concepts of machine learning, and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc. Here
market basket analysis is a technique used by the various big retailer to discover the associations
1. Apriori
2. Eclat
Apriori is an algorithm used for Association Rule learning. It searches for a series of frequent sets
of items in the datasets. It builds on associations and correlations between the itemsets. It is the
algorithm behind “You may also like” that you commonly saw in recommendation platforms[3].
Apriori algorithm assumes that any subset of a frequent itemset must be frequent. Say, a
transaction containing {milk, eggs, bread} also contains {eggs, bread}. So, according to the
principle of Apriori, if {milk, eggs, bread} is frequent, then {eggs, bread} must also be frequent
[4].
In order to select the interesting rules out of multiple possible rules from this small business
Support
Confidence
Lift
Conviction
Support
Support of item x is nothing but the ratio of the number of transactions in which item x appears to
Confidence
Confidence (x => y) signifies the likelihood of the item y being purchased when item x is
Lift (x => y) is nothing but the ‘interestingness’ or the likelihood of the item y being purchased
when item x is sold. Unlike confidence (x => y), this method takes into account the popularity of
the item y.
Lift (x => y) > 1 means that there is a positive correlation within the itemset, i.e., products in
Lift (x => y) < 1 means that there is a negative correlation within the itemset, i.e., products in
Conviction
Figure 5. The set of items including milk, bread, egg, cookie, coffee and juice
Step-1:
In the first step, we index the data and then calculate the support for each one, if support was less
Step-2:
Figure 8. Continue to calculate the support and select the best answer
Part(b): Show two rules that have a confidence of 70% or greater for an itemset containing
three items from part a.
Step-1:
Step-2:
In addition to the above rules, the following can also be considered, but in the question only two
Problem Statement:
For the implementation of the Apriori algorithm, we are using data collected from a SuperMarket,
where each row indicates all the items purchased in a particular transaction.
The manager of a retail store is trying to find out an association rule between items, to figure out
which items are more often bought together so that he can keep the items together in order to
increase sales.
The dataset has 7,500 entries. Drive link to download dataset[4][6].
Environment Setup:
Before we move forward, we need to install the ‘apyori’ package first on command prompt.
Now we have to proceed by reading the dataset we have, that is in a csv format. We do that using
We import the apriori function from the apyori module. We store the resulting output from apriori
should appear at least in 3 transactions in a day. Our data is collected over a week. Hence, the
5. Minimum Length is set to 2, as we are calculating the lift values for buying an item B given
In the LHS variable, we store the first item from all the results, from which we obtain the second
item that is bought after that item is already bought, which is now stored in the RHS variable.
The supports, confidences and lifts store all the support, confidence and lift values from the
results [6].
def inspect(results):
lhs =[tuple(result[2][0][0])[0] for result in results]
rhs =[tuple(result[2][0][1])[0] for result in results]
supports =[result[1] for result in results]
confidences =[result[2][0][2] for result in results]
lifts =[result[2][0][3] for result in results]
return list (zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ["Left hand side", "Right hand side",
"Support", "Confidence", "Lift"])
Finally, we store these variables into one dataframe, so that they are easier to visualize.
resultsinDataFrame
Figure 17. Variables into one dataframe
to boost their sales and prioritize giving offers on the pair of items with greater Lift values [6].
Why Apriori?
Despite being a simple one, Apriori algorithms have some limitations including:
Waste of time when it comes to handling a large number of candidates with frequent
itemsets.
The efficiency of this algorithm goes down when there is a large number of transactions
Required high computation power and need to scan the entire database[4].
Summary
Figure 19. Flowchart of Apriori algorithm[8]
Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable. It tries to find some interesting relations or associations among the variables of the
dataset. It is based on different rules to discover the interesting relations between variables in the
database. The flowchart above will help summarize the entire working of the algorithm[2].
Github repository for whole codes
References:
[1] https://www.softwaretestinghelp.com/apriori-algorithm/
[2] https://www.javatpoint.com/association-rule-learning
1b1d7a8b7bc
[4] https://intellipaat.com/blog/data-science-apriori-algorithm/
2019, https://www.journals.elsevier.com/information-and-software-technology
[6] https://djinit-ai.github.io/2020/09/22/apriori-algorithm.html#understanding-our-used-case
[7] https://www.datacamp.com/tutorial/market-basket-analysis-r
[8] https://www.researchgate.net/figure/Flowchart-of-Apriori-algorithm_fig2_351361530https://
www.researchgate.net/figure/Flowchart-of-Apriori-algorithm_fig2_351361530