Data profiling in Pandas using Python

RFM Analysis Analysis Using Python

Last Updated : 01 May, 2025

RFM stands for recency, frequency, monetary value. In business analytics, we often use this concept to divide customers into different segments, like high-value customers, medium value customers or low-value customers, and similarly many others.

The goal is to calculate the RFM scores and categorize customers into segments like Top Customers, High Value Customers, Medium Value Customers, Low Value Customers, and Lost Customers. This segmentation helps businesses target their marketing efforts effectively.

Let’s assume we are a company, our company name is geek, let’s perform the RFM analysis on our customers

Recency: How recently has the customer made a transaction with us
Frequency: How frequent is the customer in ordering/buying some product from us
Monetary: How much does the customer spend on purchasing products from us.

Getting Started

Loading the Necessary Libraries and the Data

Here, we will import the required modules (pandas, datetime, and numpy) and read the dataset into a DataFrame.

Dataset Used : rfm_data.csv

Python

import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt

# Importing the data
df = pd.read_csv('https://github.com/M0hamedIbrahim1/RFM-Analysis/blob/main/Dataset/rfm_data.csv')
df.head()

Output

Screenshot-2025-04-04-120925

Calculating Recency

In this step, we calculate Recency for each customer. Recency represents how recently a customer made a purchase, which is calculated by finding the most recent purchase date for each customer.

Python

df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])  # Converting PurchaseDate to datetime
df_recency = df.groupby(by='CustomerID', as_index=False)['PurchaseDate'].max()
df_recency.columns = ['CustomerID', 'LastPurchaseDate']
recent_date = df_recency['LastPurchaseDate'].max()
df_recency['Recency'] = df_recency['LastPurchaseDate'].apply(lambda x: (recent_date - x).days)
df_recency.head()

Output

Screenshot-2025-04-04-121211

Calculating Frequency

We now calculate Frequency. This is the number of times each customer made a purchase.

Python

frequency_df = df.drop_duplicates().groupby(by=['CustomerID'], as_index=False)['PurchaseDate'].count()
frequency_df.columns = ['CustomerID', 'Frequency']
frequency_df.head()

Output

Screenshot-2025-04-04-121554

Calculating Monetary Value

Next, we calculate the Monetary Value, which is the total amount each customer has spent. This is computed by summing up the TransactionAmount for each customer.

Python

df['Total'] = df['TransactionAmount']  # Total spent by each customer
monetary_df = df.groupby(by='CustomerID', as_index=False)['Total'].sum()
monetary_df.columns = ['CustomerID', 'Monetary']
monetary_df.head()

Output

Screenshot-2025-04-04-121727

Merging All Three Columns in One DataFrame

At this point, we merge the Recency, Frequency, and Monetary values into one DataFrame for analysis.

Python

rf_df = df_recency.merge(frequency_df, on='CustomerID')
rfm_df = rf_df.merge(monetary_df, on='CustomerID').drop(columns='LastPurchaseDate')
rfm_df.head()

Output

Screenshot-2025-04-04-121925

Ranking Customers Based on Recency, Frequency, and Monetary Values

We then rank the customers based on Recency, Frequency, and Monetary values. These rankings are normalized to ensure all values are on a scale from 0 to 100.

Python

# Ranking customers based on Recency, Frequency, and Monetary
rfm_df['R_rank'] = rfm_df['Recency'].rank(ascending=False)
rfm_df['F_rank'] = rfm_df['Frequency'].rank(ascending=True)
rfm_df['M_rank'] = rfm_df['Monetary'].rank(ascending=True)

# Normalizing the ranks
rfm_df['R_rank_norm'] = (rfm_df['R_rank'] / rfm_df['R_rank'].max()) * 100
rfm_df['F_rank_norm'] = (rfm_df['F_rank'] / rfm_df['F_rank'].max()) * 100
rfm_df['M_rank_norm'] = (rfm_df['M_rank'] / rfm_df['M_rank'].max()) * 100

# Dropping the individual ranks
rfm_df.drop(columns=['R_rank', 'F_rank', 'M_rank'], inplace=True)
rfm_df.head()

Output

Screenshot-2025-04-04-122204

Calculating the RFM Score

Now we calculate the RFM score, which is a weighted sum of the normalized Recency, Frequency, and Monetary ranks. This score helps in categorizing the customers.

The formula for calculating the RFM score is: RFM Score = (0.15 * Recency) + (0.28 * Frequency) + (0.57 * Monetary)

Python

rfm_df['RFM_Score'] = 0.15 * rfm_df['R_rank_norm'] + 0.28 * rfm_df['F_rank_norm'] + 0.57 * rfm_df['M_rank_norm']
rfm_df['RFM_Score'] *= 0.05
rfm_df = rfm_df.round(2)

# Displaying the first few rows with CustomerID, RFM_Score
rfm_df[['CustomerID', 'RFM_Score']].head(7)

Output

Screenshot-2025-04-04-122602

Rating Customers Based on the RFM Score

We assign customer segments based on their RFM score:

Top Customers: RFM score > 4.5
High Value Customer: 4 < RFM score <= 4.5
Medium Value Customer: 3 < RFM score <= 4
Low Value Customers: 1.6 < RFM score <= 3
Lost Customers: RFM score <= 1.6

Python

rfm_df["Customer_segment"] = np.where(rfm_df['RFM_Score'] > 4.5, "Top Customers",
                                       np.where(rfm_df['RFM_Score'] > 4, "High value Customer",
                                                np.where(rfm_df['RFM_Score'] > 3, "Medium Value Customer",
                                                         np.where(rfm_df['RFM_Score'] > 1.6, 'Low Value Customers', 'Lost Customers'))))

# Displaying the first 20 rows with CustomerID, RFM_Score, and Customer Segment
rfm_df[['CustomerID', 'RFM_Score', 'Customer_segment']].head(20)

Output

Screenshot-2025-04-04-123059

Screenshot-2025-04-04-123115

Visualizing the Customer Segments

Finally, we visualize the customer segments using a pie chart, which shows the proportion of each customer segment.

Python

plt.pie(rfm_df.Customer_segment.value_counts(),
        labels=rfm_df.Customer_segment.value_counts().index,
        autopct='%.0f%%')
plt.show()

Output

Screenshot-2025-04-04-123244

Data profiling in Pandas using Python

A

anuragnayak

Improve

Article Tags :

Practice Tags :

python

Similar Reads

Zomato Data Analysis Using Python

Python and its following libraries are used to analyze Zomato data. Numpy- With Numpy arrays, complex computations are executed quickly, and large calculations are handled efficiently.Matplotlib- It has a wide range of features for creating high-quality plots, charts, histograms, scatter plots, and

Sequential Data Analysis in Python

Sequential data, often referred to as ordered data, consists of observations arranged in a specific order. This type of data is not necessarily time-based; it can represent sequences such as text, DNA strands, or user actions. In this article, we are going to explore, sequential data analysis, it's

Data profiling in Pandas using Python

Pandas is one of the most popular Python library mainly used for data manipulation and analysis. When we are working with large data, many times we need to perform Exploratory Data Analysis. We need to get the detailed description about different columns available and there relation, null check, dat

Python - Basics of Pandas using Iris Dataset

Python language is one of the most trending programming languages as it is dynamic than others. Python is a simple high-level and an open-source language used for general-purpose programming. It has many open-source libraries and Pandas is one of them. Pandas is a powerful, fast, flexible open-sourc

Active Product Sales Analysis using Matplotlib in Python

Every modern company that engages in online sales or maintains a specialized e-commerce website now aims to maximize its throughput in order to determine what precisely their clients needÂ in order to increase their chances of sales. The huge datasets handed to us can be properly analyzed to findÂ out

Data Analysis with SciPy

Scipy is a Python library useful for solving many mathematical equations and algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc. Using its high-level funct

Guidelines for asymptotic analysis

In this article, the focus is on learning some rules that can help to determine the running time of an algorithm. Asymptotic analysis refers to computing the running time of any operation in mathematical units of computation. In Asymptotic Analysis, the performance of an algorithm in terms of input

Analysis of Algorithms

Analysis of Algorithms is a fundamental aspect of computer science that involves evaluating performance of algorithms and programs. Efficiency is measured in terms of time and space. Basics on Analysis of Algorithms:Why is Analysis Important?Order of GrowthAsymptotic Analysis Worst, Average and Best

Cross-correlation Analysis in Python

Cross-correlation analysis is a powerful technique in signal processing and time series analysis used to measure the similarity between two series at different time lags. It reveals how one series (reference) is correlated with the other (target) when shifted by a specific amount. This information i

Searching Algorithms in Python

Searching algorithms are fundamental techniques used to find an element or a value within a collection of data. In this tutorial, we'll explore some of the most commonly used searching algorithms in Python. These algorithms include Linear Search, Binary Search, Interpolation Search, and Jump Search.