RFM Analysis Analysis Using Python
Last Updated :
01 May, 2025
RFM stands for recency, frequency, monetary value. In business analytics, we often use this concept to divide customers into different segments, like high-value customers, medium value customers or low-value customers, and similarly many others.
The goal is to calculate the RFM scores and categorize customers into segments like Top Customers, High Value Customers, Medium Value Customers, Low Value Customers, and Lost Customers. This segmentation helps businesses target their marketing efforts effectively.
Let’s assume we are a company, our company name is geek, let’s perform the RFM analysis on our customers
- Recency: How recently has the customer made a transaction with us
- Frequency: How frequent is the customer in ordering/buying some product from us
- Monetary: How much does the customer spend on purchasing products from us.
Getting Started
Loading the Necessary Libraries and the Data
Here, we will import the required modules (pandas
, datetime
, and numpy
) and read the dataset into a DataFrame.
Dataset Used : rfm_data.csv
Python
import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
# Importing the data
df = pd.read_csv('https://github.com/M0hamedIbrahim1/RFM-Analysis/blob/main/Dataset/rfm_data.csv')
df.head()
Output

Calculating Recency
In this step, we calculate Recency for each customer. Recency represents how recently a customer made a purchase, which is calculated by finding the most recent purchase date for each customer.
Python
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate']) # Converting PurchaseDate to datetime
df_recency = df.groupby(by='CustomerID', as_index=False)['PurchaseDate'].max()
df_recency.columns = ['CustomerID', 'LastPurchaseDate']
recent_date = df_recency['LastPurchaseDate'].max()
df_recency['Recency'] = df_recency['LastPurchaseDate'].apply(lambda x: (recent_date - x).days)
df_recency.head()
Output

Calculating Frequency
We now calculate Frequency. This is the number of times each customer made a purchase.
Python
frequency_df = df.drop_duplicates().groupby(by=['CustomerID'], as_index=False)['PurchaseDate'].count()
frequency_df.columns = ['CustomerID', 'Frequency']
frequency_df.head()
Output

Calculating Monetary Value
Next, we calculate the Monetary Value, which is the total amount each customer has spent. This is computed by summing up the TransactionAmount
for each customer.
Python
df['Total'] = df['TransactionAmount'] # Total spent by each customer
monetary_df = df.groupby(by='CustomerID', as_index=False)['Total'].sum()
monetary_df.columns = ['CustomerID', 'Monetary']
monetary_df.head()
Output

Merging All Three Columns in One DataFrame
At this point, we merge the Recency, Frequency, and Monetary values into one DataFrame for analysis.
Python
rf_df = df_recency.merge(frequency_df, on='CustomerID')
rfm_df = rf_df.merge(monetary_df, on='CustomerID').drop(columns='LastPurchaseDate')
rfm_df.head()
Output

Ranking Customers Based on Recency, Frequency, and Monetary Values
We then rank the customers based on Recency, Frequency, and Monetary values. These rankings are normalized to ensure all values are on a scale from 0 to 100.
Python
# Ranking customers based on Recency, Frequency, and Monetary
rfm_df['R_rank'] = rfm_df['Recency'].rank(ascending=False)
rfm_df['F_rank'] = rfm_df['Frequency'].rank(ascending=True)
rfm_df['M_rank'] = rfm_df['Monetary'].rank(ascending=True)
# Normalizing the ranks
rfm_df['R_rank_norm'] = (rfm_df['R_rank'] / rfm_df['R_rank'].max()) * 100
rfm_df['F_rank_norm'] = (rfm_df['F_rank'] / rfm_df['F_rank'].max()) * 100
rfm_df['M_rank_norm'] = (rfm_df['M_rank'] / rfm_df['M_rank'].max()) * 100
# Dropping the individual ranks
rfm_df.drop(columns=['R_rank', 'F_rank', 'M_rank'], inplace=True)
rfm_df.head()
Output

Calculating the RFM Score
Now we calculate the RFM score, which is a weighted sum of the normalized Recency, Frequency, and Monetary ranks. This score helps in categorizing the customers.
The formula for calculating the RFM score is: RFM Score = (0.15 * Recency) + (0.28 * Frequency) + (0.57 * Monetary)
Python
rfm_df['RFM_Score'] = 0.15 * rfm_df['R_rank_norm'] + 0.28 * rfm_df['F_rank_norm'] + 0.57 * rfm_df['M_rank_norm']
rfm_df['RFM_Score'] *= 0.05
rfm_df = rfm_df.round(2)
# Displaying the first few rows with CustomerID, RFM_Score
rfm_df[['CustomerID', 'RFM_Score']].head(7)
Output

Rating Customers Based on the RFM Score
We assign customer segments based on their RFM score:
- Top Customers: RFM score > 4.5
- High Value Customer: 4 < RFM score <= 4.5
- Medium Value Customer: 3 < RFM score <= 4
- Low Value Customers: 1.6 < RFM score <= 3
- Lost Customers: RFM score <= 1.6
Python
rfm_df["Customer_segment"] = np.where(rfm_df['RFM_Score'] > 4.5, "Top Customers",
np.where(rfm_df['RFM_Score'] > 4, "High value Customer",
np.where(rfm_df['RFM_Score'] > 3, "Medium Value Customer",
np.where(rfm_df['RFM_Score'] > 1.6, 'Low Value Customers', 'Lost Customers'))))
# Displaying the first 20 rows with CustomerID, RFM_Score, and Customer Segment
rfm_df[['CustomerID', 'RFM_Score', 'Customer_segment']].head(20)
Output


Visualizing the Customer Segments
Finally, we visualize the customer segments using a pie chart, which shows the proportion of each customer segment.
Python
plt.pie(rfm_df.Customer_segment.value_counts(),
labels=rfm_df.Customer_segment.value_counts().index,
autopct='%.0f%%')
plt.show()
Output
Similar Reads
Zomato Data Analysis Using Python
Python and its following libraries are used to analyze Zomato data. Numpy- With Numpy arrays, complex computations are executed quickly, and large calculations are handled efficiently.Matplotlib- It has a wide range of features for creating high-quality plots, charts, histograms, scatter plots, and
4 min read
Sequential Data Analysis in Python
Sequential data, often referred to as ordered data, consists of observations arranged in a specific order. This type of data is not necessarily time-based; it can represent sequences such as text, DNA strands, or user actions. In this article, we are going to explore, sequential data analysis, it's
8 min read
Data profiling in Pandas using Python
Pandas is one of the most popular Python library mainly used for data manipulation and analysis. When we are working with large data, many times we need to perform Exploratory Data Analysis. We need to get the detailed description about different columns available and there relation, null check, dat
1 min read
Python - Basics of Pandas using Iris Dataset
Python language is one of the most trending programming languages as it is dynamic than others. Python is a simple high-level and an open-source language used for general-purpose programming. It has many open-source libraries and Pandas is one of them. Pandas is a powerful, fast, flexible open-sourc
8 min read
Active Product Sales Analysis using Matplotlib in Python
Every modern company that engages in online sales or maintains a specialized e-commerce website now aims to maximize its throughput in order to determine what precisely their clients need in order to increase their chances of sales. The huge datasets handed to us can be properly analyzed to find out
3 min read
Data Analysis with SciPy
Scipy is a Python library useful for solving many mathematical equations and algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc. Using its high-level funct
6 min read
Guidelines for asymptotic analysis
In this article, the focus is on learning some rules that can help to determine the running time of an algorithm. Asymptotic analysis refers to computing the running time of any operation in mathematical units of computation. In Asymptotic Analysis, the performance of an algorithm in terms of input
4 min read
Analysis of Algorithms
Analysis of Algorithms is a fundamental aspect of computer science that involves evaluating performance of algorithms and programs. Efficiency is measured in terms of time and space. Basics on Analysis of Algorithms:Why is Analysis Important?Order of GrowthAsymptotic Analysis Worst, Average and Best
1 min read
Cross-correlation Analysis in Python
Cross-correlation analysis is a powerful technique in signal processing and time series analysis used to measure the similarity between two series at different time lags. It reveals how one series (reference) is correlated with the other (target) when shifted by a specific amount. This information i
5 min read
Searching Algorithms in Python
Searching algorithms are fundamental techniques used to find an element or a value within a collection of data. In this tutorial, we'll explore some of the most commonly used searching algorithms in Python. These algorithms include Linear Search, Binary Search, Interpolation Search, and Jump Search.
6 min read