Open In App

RFM Analysis Analysis Using Python

Last Updated : 01 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

RFM stands for recency, frequency, monetary value. In business analytics, we often use this concept to divide customers into different segments, like high-value customers, medium value customers or low-value customers, and similarly many others.

The goal is to calculate the RFM scores and categorize customers into segments like Top Customers, High Value Customers, Medium Value Customers, Low Value Customers, and Lost Customers. This segmentation helps businesses target their marketing efforts effectively.

Let’s assume we are a company, our company name is geek, let’s perform the RFM analysis on our customers

  1. Recency: How recently has the customer made a transaction with us
  2. Frequency: How frequent is the customer in ordering/buying some product from us
  3. Monetary: How much does the customer spend on purchasing products from us.

Getting Started

Loading the Necessary Libraries and the Data

Here, we will import the required modules (pandas, datetime, and numpy) and read the dataset into a DataFrame.

Dataset Used : rfm_data.csv

Python
import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt

# Importing the data
df = pd.read_csv('https://github.com/M0hamedIbrahim1/RFM-Analysis/blob/main/Dataset/rfm_data.csv')
df.head()

Output

Screenshot-2025-04-04-120925

Calculating Recency

In this step, we calculate Recency for each customer. Recency represents how recently a customer made a purchase, which is calculated by finding the most recent purchase date for each customer.

Python
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])  # Converting PurchaseDate to datetime
df_recency = df.groupby(by='CustomerID', as_index=False)['PurchaseDate'].max()
df_recency.columns = ['CustomerID', 'LastPurchaseDate']
recent_date = df_recency['LastPurchaseDate'].max()
df_recency['Recency'] = df_recency['LastPurchaseDate'].apply(lambda x: (recent_date - x).days)
df_recency.head()

Output

Screenshot-2025-04-04-121211

Calculating Frequency

We now calculate Frequency. This is the number of times each customer made a purchase.

Python
frequency_df = df.drop_duplicates().groupby(by=['CustomerID'], as_index=False)['PurchaseDate'].count()
frequency_df.columns = ['CustomerID', 'Frequency']
frequency_df.head()

Output

Screenshot-2025-04-04-121554

Calculating Monetary Value

Next, we calculate the Monetary Value, which is the total amount each customer has spent. This is computed by summing up the TransactionAmount for each customer.

Python
df['Total'] = df['TransactionAmount']  # Total spent by each customer
monetary_df = df.groupby(by='CustomerID', as_index=False)['Total'].sum()
monetary_df.columns = ['CustomerID', 'Monetary']
monetary_df.head()

Output

Screenshot-2025-04-04-121727

Merging All Three Columns in One DataFrame

At this point, we merge the Recency, Frequency, and Monetary values into one DataFrame for analysis.

Python
rf_df = df_recency.merge(frequency_df, on='CustomerID')
rfm_df = rf_df.merge(monetary_df, on='CustomerID').drop(columns='LastPurchaseDate')
rfm_df.head()

Output

Screenshot-2025-04-04-121925

Ranking Customers Based on Recency, Frequency, and Monetary Values

We then rank the customers based on Recency, Frequency, and Monetary values. These rankings are normalized to ensure all values are on a scale from 0 to 100.

Python
# Ranking customers based on Recency, Frequency, and Monetary
rfm_df['R_rank'] = rfm_df['Recency'].rank(ascending=False)
rfm_df['F_rank'] = rfm_df['Frequency'].rank(ascending=True)
rfm_df['M_rank'] = rfm_df['Monetary'].rank(ascending=True)

# Normalizing the ranks
rfm_df['R_rank_norm'] = (rfm_df['R_rank'] / rfm_df['R_rank'].max()) * 100
rfm_df['F_rank_norm'] = (rfm_df['F_rank'] / rfm_df['F_rank'].max()) * 100
rfm_df['M_rank_norm'] = (rfm_df['M_rank'] / rfm_df['M_rank'].max()) * 100

# Dropping the individual ranks
rfm_df.drop(columns=['R_rank', 'F_rank', 'M_rank'], inplace=True)
rfm_df.head()

Output

Screenshot-2025-04-04-122204

Calculating the RFM Score

Now we calculate the RFM score, which is a weighted sum of the normalized Recency, Frequency, and Monetary ranks. This score helps in categorizing the customers.

The formula for calculating the RFM score is: RFM Score = (0.15 * Recency) + (0.28 * Frequency) + (0.57 * Monetary)

Python
rfm_df['RFM_Score'] = 0.15 * rfm_df['R_rank_norm'] + 0.28 * rfm_df['F_rank_norm'] + 0.57 * rfm_df['M_rank_norm']
rfm_df['RFM_Score'] *= 0.05
rfm_df = rfm_df.round(2)

# Displaying the first few rows with CustomerID, RFM_Score
rfm_df[['CustomerID', 'RFM_Score']].head(7)

Output

Screenshot-2025-04-04-122602

Rating Customers Based on the RFM Score

We assign customer segments based on their RFM score:

  • Top Customers: RFM score > 4.5
  • High Value Customer: 4 < RFM score <= 4.5
  • Medium Value Customer: 3 < RFM score <= 4
  • Low Value Customers: 1.6 < RFM score <= 3
  • Lost Customers: RFM score <= 1.6
Python
rfm_df["Customer_segment"] = np.where(rfm_df['RFM_Score'] > 4.5, "Top Customers",
                                       np.where(rfm_df['RFM_Score'] > 4, "High value Customer",
                                                np.where(rfm_df['RFM_Score'] > 3, "Medium Value Customer",
                                                         np.where(rfm_df['RFM_Score'] > 1.6, 'Low Value Customers', 'Lost Customers'))))

# Displaying the first 20 rows with CustomerID, RFM_Score, and Customer Segment
rfm_df[['CustomerID', 'RFM_Score', 'Customer_segment']].head(20)

Output

Screenshot-2025-04-04-123059Screenshot-2025-04-04-123115

Visualizing the Customer Segments

Finally, we visualize the customer segments using a pie chart, which shows the proportion of each customer segment.

Python
plt.pie(rfm_df.Customer_segment.value_counts(),
        labels=rfm_df.Customer_segment.value_counts().index,
        autopct='%.0f%%')
plt.show()

Output

Screenshot-2025-04-04-123244


Next Article
Practice Tags :

Similar Reads