Implementing Apriori algorithm in Python
Last Updated :
26 May, 2025
Apriori Algorithm is a machine learning algorithm used for market basket analysis. It helps to find associations or relationships between items in large transactional datasets. A common real-world application is product recommendation where items are suggested to users based on their shopping cart contents. Companies like Walmart have used this algorithm to improve product suggestions and drive sales.
In this article we’ll do step-by-step implementation of the Apriori algorithm in Python using the mlxtend library.
Step 1: Importing Required Libraries
Before we begin we need to import the necessary Python libraries like Pandas , Numpy and mlxtend.
Python
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
Step 2: Loading and exploring the data
We start by loading a popular groceries dataset. This dataset contains customer transactions with details like customer ID, transaction date, and the item purchased. you can download the dataset from here.
Python
import pandas as pd
df = pd.read_csv("/content/Groceries_dataset.csv")
print(df.head())
Output:
Dataset- Each row represents one item in a customer's basket on a given date.
- To use the Apriori algorithm we must convert this into full transactions per customer per visit.
Step 3: Group Items by Transaction
We group items purchased together by the same customer on the same day to form one transaction.
Python
basket = df.groupby(['Member_number', 'Date'])['itemDescription'].apply(list).reset_index()
transactions = basket['itemDescription'].tolist()
print(transactions)
Output:
Group itemsStep 4: Convert to One-Hot Format
Apriori needs data in True/False format like Did the item appear in the basket?. We use Transaction Encoder for this:
Python
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_array, columns=te.columns_)
Step 5: Run Apriori Algorithm
Now we find frequent itemsets combinations of items that often occur together. Here min_support=0.01 means itemsets that appear in at least 1% of transactions. This gives us common combinations of items.
Python
from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(df_encoded, min_support=0.01, use_colnames=True)
print("Total Frequent Itemsets:", frequent_itemsets.shape[0])
Output:
Total Frequent Itemsets: 69
Step 6: Generate Association Rules
Now we find rules like If bread and butter are bought, milk is also likely to be bought.
- Support: How often the rule appears in the dataset.
- Confidence: Probability of buying item B if item A is bought.
- Lift: Strength of the rule over random chance. (>1 means it's a good rule)
Python
from mlxtend.frequent_patterns import association_rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1)
rules = rules[rules['antecedents'].apply(lambda x: len(x) >= 1) & rules['consequents'].apply(lambda x: len(x) >= 1)]
print("Association Rules:", rules.shape[0])
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head(5)
Output:
Association rulesStep 7: Visualize the Most Popular Items
Let’s see which items are most frequently bought:
Python
import matplotlib.pyplot as plt
top_items = df['itemDescription'].value_counts().head(10)
top_items.plot(kind='bar', title='Top 10 Most Purchased Items')
plt.xlabel("Item")
plt.ylabel("Count")
plt.show()
Output:
Most Purchased ItemsAs shown in the above output Whole milk is the most frequently bought item, followed by other vegetables, rolls/bun and soda.
Similar Reads
Apriori Algorithm Apriori Algorithm is a basic method used in data analysis to find groups of items that often appear together in large sets of data. It helps to discover useful patterns or rules about how items are related which is particularly valuable in market basket analysis. Like in a grocery store if many cust
6 min read
Implementing Rich getting Richer phenomenon using Barabasi Albert Model in Python Prerequisite- Introduction to Social Networks, Barabasi Albert Graph In social networks, there is a phenomenon called Rich getting Richer also known as Preferential Attachment. In Preferential Attachment, a person who is already rich gets more and more and a person who is having less gets less. This
4 min read
ML - Naive Bayes Scratch Implementation using Python Naive Bayes is a probabilistic machine learning algorithms based on the Bayes Theorem. It is a simple yet powerful algorithm because of its understanding, simplicity and ease of implementation. It is popular method for classification applications such as spam filtering and text classification. In th
7 min read
Python Multi-Maths Package The Multi-Maths Python library is a powerful tool that combines arithmetic, geometry, and statistics functions into a single package. This library allows data scientists and analysts to perform a wide range of mathematical operations without the need for multiple specialized libraries.In this articl
6 min read
How to Implement Adam Gradient Descent from Scratch using Python? Grade descent is an extensively used optimization algorithm in machine literacy and deep literacy. It's used to minimize the cost or loss function of a model by iteratively confirming the model's parameters grounded on the slants of the cost function with respect to those parameters. One variant of
14 min read
Multidimensional image processing using Scipy in Python SciPy is a Python library used for scientific and technical computing. It is built on top of NumPy, a library for efficient numerical computing, and provides many functions for working with arrays, numerical optimization, signal processing, and other common tasks in scientific computing. Image proce
14 min read