Programming Assignment
Programming Assignment
Association Analysis
Association rule mining is a technique to identify underlying relations between different items.
Take an example of a Super Market where customers can buy variety of items. Usually, there
is a pattern in what the customers buy. For instance, mothers with babies buy baby products
such as milk and diapers. Damsels may buy makeup items whereas bachelors may buy beers
and chips etc. In short, transactions involve a pattern. More profit can be generated if the
relationship between the items purchased in different transactions can be identified.
Different statistical algorithms have been developed to implement association rule mining, and
Apriori is one such algorithm. In this article we will study the theory behind the Apriori algorithm
and will later implement Apriori algorithm in Python.
• Support
• Confidence
• Lift
Listing 1: Sample code for assosiation analysis. Import libraries read file and display first rows
of data.
1 import numpy as np
2 import matplotlib.pyplot as plt
3 import pandas as pd
4 from apyori import apriori
5
6 store_data = pd.read_csv(’C:/Users/me/Downloads/store_data.csv’, header=←-
None)
7
8 store_data.head()
Page 1
Dataset:
The given dataset, that describe associations between different products given 7500 transac-
tions over the course of a week at a French retail store.
In this output you will see that the first line is now treated as a record instead of header as
shown below:
1 store_data.shape
The above script should return (7501, 20), which means it has 7501 rows and 20 columns.
Now we will use the Apriori algorithm to find out which items are commonly sold together, so
that store owner can take action to place the related items together or advertise them together
in order to have increased profit.
Data Proprocessing
The Apriori library we are going to use requires our dataset to be in the form of a list of lists,
where the whole dataset is a big list and each transaction in the dataset is an inner list within
the outer big list. Currently we have data in the form of a pandas dataframe. To convert our
pandas dataframe into a list of lists, execute the following script:
1 records = []
2 for i in range(0, 7501):
3 records.append([str(store_data.values[i,j]) for j in range(0, 20)])
Applying Apriori
The next step is to apply the Apriori algorithm on the dataset. To do so, we can use the apriori
class that we imported from the apyori library.
Execute the following script:
Page 2
1 association_rules = apriori(records, min_support=0.0045, min_confidence←-
=0.0002, min_lift=1, min_length=2)
2 association_results = list(association_rules)
1 print(len(association_rules))
1 print(len(association_rules))
The script should return 350. Now see what happens if we again change the values of the given
parameters
In the second line here we convert the rules found by the apriori class into a list since it is easier
to view the results in this form.
Let’s first find the total number of rules mined by the apriori class. Execute the following script:
1 print(len(association_rules))
The script above should return 48. we will stick up with lift=3 as it gives appropraite number of
rules.
Each item corresponds to one rule. Let’s print the first item in the association.rules list to
see the first rule. Execute the following script:
1 print(association_rules[0])
Page 3
The output should look like this:
The first item in the list is a list itself containing three items. The first item of the list shows
the grocery items in the rule.
The following script displays the rule, the support, the confidence, and lift for each rule in a
more clear way:
If you execute the above script, you will see all the rules returned by the apriori class. The first
four rules returned by the apriori class look like this:
Results:
Page 4
Support: 0.005732568990801126
Confidence: 0.3006993006993007
Lift: 3.790832696715049
=====================================
Rule: escalope -> pasta
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
=====================================
Rule: ground beef -> herb & pepper
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
=====================================
Conclusion
Association rule mining algorithms such as Apriori are very useful for finding simple associations
between our data items. They are easy to implement and have high explain-ability. However for
more advanced insights, such those used by Google or Amazon etc., more complex algorithms,
such as recommender systems, are used. However, you can probably see that this method is
a very simple way to get basic associations if that’s all your use-case needs.
Page 5