Implementing Apriori algorithm in Python
Last Updated :
11 Jan, 2023
Prerequisites: Apriori Algorithm
Apriori Algorithm is a Machine Learning algorithm which is used to gain insight into the structured relationships between different items involved. The most prominent practical application of the algorithm is to recommend products based on the products already present in the user’s cart. Walmart especially has made great use of the algorithm in suggesting products to it’s users.
Dataset : Groceries data
Implementation of algorithm in Python:
Step 1: Importing the required libraries
Python3
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
|
Step 2: Loading and exploring the data
Python3
cd C:\Users\Dev\Desktop\Kaggle\Apriori Algorithm
data = pd.read_excel( 'Online_Retail.xlsx' )
data.head()
|



Step 3: Cleaning the Data
Python3
data[ 'Description' ] = data[ 'Description' ]. str .strip()
data.dropna(axis = 0 , subset = [ 'InvoiceNo' ], inplace = True )
data[ 'InvoiceNo' ] = data[ 'InvoiceNo' ].astype( 'str' )
data = data[~data[ 'InvoiceNo' ]. str .contains( 'C' )]
|
Step 4: Splitting the data according to the region of transaction
Python3
basket_France = (data[data[ 'Country' ] = = "France" ]
.groupby([ 'InvoiceNo' , 'Description' ])[ 'Quantity' ]
. sum ().unstack().reset_index().fillna( 0 )
.set_index( 'InvoiceNo' ))
basket_UK = (data[data[ 'Country' ] = = "United Kingdom" ]
.groupby([ 'InvoiceNo' , 'Description' ])[ 'Quantity' ]
. sum ().unstack().reset_index().fillna( 0 )
.set_index( 'InvoiceNo' ))
basket_Por = (data[data[ 'Country' ] = = "Portugal" ]
.groupby([ 'InvoiceNo' , 'Description' ])[ 'Quantity' ]
. sum ().unstack().reset_index().fillna( 0 )
.set_index( 'InvoiceNo' ))
basket_Sweden = (data[data[ 'Country' ] = = "Sweden" ]
.groupby([ 'InvoiceNo' , 'Description' ])[ 'Quantity' ]
. sum ().unstack().reset_index().fillna( 0 )
.set_index( 'InvoiceNo' ))
|
Step 5: Hot encoding the Data
Python3
def hot_encode(x):
if (x< = 0 ):
return 0
if (x> = 1 ):
return 1
basket_encoded = basket_France.applymap(hot_encode)
basket_France = basket_encoded
basket_encoded = basket_UK.applymap(hot_encode)
basket_UK = basket_encoded
basket_encoded = basket_Por.applymap(hot_encode)
basket_Por = basket_encoded
basket_encoded = basket_Sweden.applymap(hot_encode)
basket_Sweden = basket_encoded
|
Step 6: Building the models and analyzing the results
a) France:
Python3
frq_items = apriori(basket_France, min_support = 0.05 , use_colnames = True )
rules = association_rules(frq_items, metric = "lift" , min_threshold = 1 )
rules = rules.sort_values([ 'confidence' , 'lift' ], ascending = [ False , False ])
print (rules.head())
|

From the above output, it can be seen that paper cups and paper and plates are bought together in France. This is because the French have a culture of having a get-together with their friends and family atleast once a week. Also, since the French government has banned the use of plastic in the country, the people have to purchase the paper-based alternatives.
b) United Kingdom:
Python3
frq_items = apriori(basket_UK, min_support = 0.01 , use_colnames = True )
rules = association_rules(frq_items, metric = "lift" , min_threshold = 1 )
rules = rules.sort_values([ 'confidence' , 'lift' ], ascending = [ False , False ])
print (rules.head())
|

If the rules for British transactions are analyzed a little deeper, it is seen that the British people buy different colored tea-plates together. A reason behind this may be because typically the British enjoy tea very much and often collect different colored tea-plates for different occasions.
c) Portugal:
Python3
frq_items = apriori(basket_Por, min_support = 0.05 , use_colnames = True )
rules = association_rules(frq_items, metric = "lift" , min_threshold = 1 )
rules = rules.sort_values([ 'confidence' , 'lift' ], ascending = [ False , False ])
print (rules.head())
|

On analyzing the association rules for Portuguese transactions, it is observed that Tiffin sets (Knick Knack Tins) and color pencils. These two products typically belong to a primary school going kid. These two products are required by children in school to carry their lunch and for creative work respectively and hence are logically make sense to be paired together.
d) Sweden:
Python3
frq_items = apriori(basket_Sweden, min_support = 0.05 , use_colnames = True )
rules = association_rules(frq_items, metric = "lift" , min_threshold = 1 )
rules = rules.sort_values([ 'confidence' , 'lift' ], ascending = [ False , False ])
print (rules.head())
|

On analyzing the above rules, it is found that boys’ and girls’ cutlery are paired together. This makes practical sense because when a parent goes shopping for cutlery for his/her children, he/she would want the product to be a little customized according to the kid’s wishes.
Similar Reads
Implementing Rich getting Richer phenomenon using Barabasi Albert Model in Python
Prerequisite- Introduction to Social Networks, Barabasi Albert Graph In social networks, there is a phenomenon called Rich getting Richer also known as Preferential Attachment. In Preferential Attachment, a person who is already rich gets more and more and a person who is having less gets less. This
4 min read
OpenAI Python API - Complete Guide
OpenAI is the leading company in the field of AI. With the public release of software like ChatGPT, DALL-E, GPT-3, and Whisper, the company has taken the entire AI industry by storm. Everyone has incorporated ChatGPT to do their work more efficiently and those who failed to do so have lost their job
15+ min read
ML | Naive Bayes Scratch Implementation using Python
Naive Bayes is a probabilistic machine learning algorithms based on the Bayes Theorem. It is a simple yet powerful algorithm because of its understanding, simplicity and ease of implementation. It is popular method for classification applications such as spam filtering and text classification. In th
7 min read
How to do Mathematical Modeling in Python?
Mathematical modeling is a powerful tool used in data science to represent real-world systems and phenomena through mathematical equations and algorithms. Python, with its rich ecosystem of libraries, provides an excellent platform for developing and implementing mathematical models. This article wi
8 min read
Python Multi-Maths Package
The Multi-Maths Python library is a powerful tool that combines arithmetic, geometry, and statistics functions into a single package. This library allows data scientists and analysts to perform a wide range of mathematical operations without the need for multiple specialized libraries. In this artic
7 min read
numpy.intersect1d() function in Python
numpy.intersect1d() function find the intersection of two arrays and return the sorted, unique values that are in both of the input arrays. Syntax: numpy.intersect1d(arr1, arr2, assume_unique = False, return_indices = False) Parameters : arr1, arr2 : [array_like] Input arrays. assume_unique : [bool]
1 min read
How to Implement Adam Gradient Descent from Scratch using Python?
Grade descent is an extensively used optimization algorithm in machine literacy and deep literacy. It's used to minimize the cost or loss function of a model by iteratively confirming the model's parameters grounded on the slants of the cost function with respect to those parameters. One variant of
14 min read
Multidimensional image processing using Scipy in Python
SciPy is a Python library used for scientific and technical computing. It is built on top of NumPy, a library for efficient numerical computing, and provides many functions for working with arrays, numerical optimization, signal processing, and other common tasks in scientific computing. Image proce
14 min read
Matrices and Matrix Arithmetic for Machine Learning
In machine learning, the data often comes in multi-dimensional arrays so matrices are best to handle such inputs. So we need to have some general idea on basic arithmetic operations in matrices. In this article, we will discuss about matrices and matrix arithmetic for machine learning. What is a Mat
6 min read
Introduction to Python GIS
Geographic Information Systems (GIS) are powerful tools for managing, analyzing, and visualizing spatial data. Python, a versatile programming language, has emerged as a popular choice for GIS applications due to its extensive libraries and ease of use. This article provides an introduction to Pytho
4 min read