0% found this document useful (0 votes)
163 views

Programming Assignment

The document discusses association rule mining using the Apriori algorithm. It describes association rule mining as a technique to identify relationships between items customers frequently purchase together. It then provides details on applying the Apriori algorithm to a dataset of 7500 grocery transactions to find common item associations and generate rules, analyzing the results and changing the parameter values.

Uploaded by

Saamia A
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views

Programming Assignment

The document discusses association rule mining using the Apriori algorithm. It describes association rule mining as a technique to identify relationships between items customers frequently purchase together. It then provides details on applying the Apriori algorithm to a dataset of 7500 grocery transactions to find common item associations and generate rules, analyzing the results and changing the parameter values.

Uploaded by

Saamia A
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Mining Assignment

Samia, 17JZBCS0022 December 15, 2019

Association Analysis

Association rule mining is a technique to identify underlying relations between different items.
Take an example of a Super Market where customers can buy variety of items. Usually, there
is a pattern in what the customers buy. For instance, mothers with babies buy baby products
such as milk and diapers. Damsels may buy makeup items whereas bachelors may buy beers
and chips etc. In short, transactions involve a pattern. More profit can be generated if the
relationship between the items purchased in different transactions can be identified.

Apriori Algorithm for Association Rule Mining

Different statistical algorithms have been developed to implement association rule mining, and
Apriori is one such algorithm. In this article we will study the theory behind the Apriori algorithm
and will later implement Apriori algorithm in Python.

There are three major components of Apriori algorithm:

• Support

• Confidence

• Lift

Listing 1: Sample code for assosiation analysis. Import libraries read file and display first rows
of data.
1 import numpy as np
2 import matplotlib.pyplot as plt
3 import pandas as pd
4 from apyori import apriori
5
6 store_data = pd.read_csv(’C:/Users/me/Downloads/store_data.csv’, header=←-
None)
7
8 store_data.head()

Page 1
Dataset:

The given dataset, that describe associations between different products given 7500 transac-
tions over the course of a week at a French retail store.
In this output you will see that the first line is now treated as a record instead of header as
shown below:

We can find row and columns by:

1 store_data.shape

The above script should return (7501, 20), which means it has 7501 rows and 20 columns.
Now we will use the Apriori algorithm to find out which items are commonly sold together, so
that store owner can take action to place the related items together or advertise them together
in order to have increased profit.

Data Proprocessing

The Apriori library we are going to use requires our dataset to be in the form of a list of lists,
where the whole dataset is a big list and each transaction in the dataset is an inner list within
the outer big list. Currently we have data in the form of a pandas dataframe. To convert our
pandas dataframe into a list of lists, execute the following script:

1 records = []
2 for i in range(0, 7501):
3 records.append([str(store_data.values[i,j]) for j in range(0, 20)])

Applying Apriori

The next step is to apply the Apriori algorithm on the dataset. To do so, we can use the apriori
class that we imported from the apyori library.
Execute the following script:

Page 2
1 association_rules = apriori(records, min_support=0.0045, min_confidence←-
=0.0002, min_lift=1, min_length=2)
2 association_results = list(association_rules)

now print the result:

1 print(len(association_rules))

The script should return 1684.


lets see what happens if we change the values of the given parameters

1 association_rules = apriori(records, min_support=0.0045, min_confidence←-


=0.0002, min_lift=2, min_length=2)
2 association_results = list(association_rules)

now print the result:

1 print(len(association_rules))

The script should return 350. Now see what happens if we again change the values of the given
parameters

1 association_rules = apriori(records, min_support=0.0045, min_confidence←-


=0.2, min_lift=3, min_length=2)
2 association_results = list(association_rules)

In the second line here we convert the rules found by the apriori class into a list since it is easier
to view the results in this form.

Viewing the Results

Let’s first find the total number of rules mined by the apriori class. Execute the following script:

1 print(len(association_rules))

The script above should return 48. we will stick up with lift=3 as it gives appropraite number of
rules.
Each item corresponds to one rule. Let’s print the first item in the association.rules list to
see the first rule. Execute the following script:

1 print(association_rules[0])

Page 3
The output should look like this:

1 RelationRecord(items=frozenset({’light cream’, ’chicken’}), support←-


=0.004532728969470737, ordered_statistics[OrderedStatistic(items_base=←-
frozenset({’light cream’}), items_add=frozenset({’chicken’}), ←-
confidence=0.29059829059829057, lift=4.84395061728395)])

The first item in the list is a list itself containing three items. The first item of the list shows
the grocery items in the rule.
The following script displays the rule, the support, the confidence, and lift for each rule in a
more clear way:

1 for item in association_rules:


2
3 # first index of the inner list
4 # Contains base item and add item
5 pair = item[0]
6 items = [x for x in pair]
7 print("Rule: " + items[0] + " -> " + items[1])
8
9 #second index of the inner list
10 print("Support: " + str(item[1]))
11
12 #third index of the list located at 0th
13 #of the third index of the inner list
14
15 print("Confidence: " + str(item[2][0][2]))
16 print("Lift: " + str(item[2][0][3]))
17 print("=====================================")

If you execute the above script, you will see all the rules returned by the apriori class. The first
four rules returned by the apriori class look like this:

Results:

Rule: light cream -> chicken


Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
=====================================
Rule: mushroom cream sauce -> escalope

Page 4
Support: 0.005732568990801126
Confidence: 0.3006993006993007
Lift: 3.790832696715049
=====================================
Rule: escalope -> pasta
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
=====================================
Rule: ground beef -> herb & pepper
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
=====================================

Conclusion

Association rule mining algorithms such as Apriori are very useful for finding simple associations
between our data items. They are easy to implement and have high explain-ability. However for
more advanced insights, such those used by Google or Amazon etc., more complex algorithms,
such as recommender systems, are used. However, you can probably see that this method is
a very simple way to get basic associations if that’s all your use-case needs.

Page 5

You might also like