Customer Segmentation Analysis
Customer Segmentation Analysis
Probably not. None of the policies checks out the one-size-fits-all criterion, as
customers have unique needs and expectations. This is where customer segmentation analysis
can save a lot of time and ensure maximum results.
A customer segmentation project aims for data analysts to identify different groups
of customers with similar needs and behaviors so that companies can tailor their marketing,
product development, and customer service strategies to meet their needs better. This can be
done by clubbing them as per: marital status, new customers, repeat customers, etc.
Today, over 60% of companies are inclined toward customer choices, making them an
advocate of customer segmentation and platforms (or tools) like Google Analytics,
Customer.io, etc.
Luxury car manufacturers like Rolls Royce often use lifestyle-centric segmentation
analysis to segment their top customers. Clearly, a data analyst familiar with customer
segmentation would be a great asset to such businesses.
Analysis Approach
The first step towards generating useful insights from the data was the data prepartion,
quality assessment and data cleaning step. After the cleaning process exploratory data
analysis on the dataset and identification customer purchasing behaviours to generate insights
can be performed.
In the data cleaning step the data quality of the following datasets were first assesed. After a
data quality assessment the following data quality issues was observed and the necessary
process to mitigate the issue was followed :
● CustomerDemographics.xlsx :
o 1 Irrelevent column was present and such columns were dropped from the
dataset.
o There were 5 columns were Missing values were present. For such columns
based on the volumne of the missing values either the records were dropped or
appropiate values were imputed at places of missing values
o For gender column there was no standardisation of data. Based on the values
available the column data was standardised to remove data inconsistency.
o The Date of Birth column was transformed to create a new feature column
'Age' and 'Age Group' to check for discripency of age distribution.
An outlier was observed and the record was removed.
o Checked whether there are duplicate records present in the dataset. In this
dataset there were no duplicate records.
● NewCustomerList.xlsx :
o 5 Irrelevent column was present and such columns were dropped from the
dataset.
o There were 4 columns were Missing values were present. For such columns
based on the volumne of the missing values either the records were dropped or
appropiate values were imputed at places of missing values
o The Date of Birth column was transformed to create a new feature column
'Age' and 'Age Group' to check for discripency of age distribution.
o There was no data inconsistency.
o Checked whether there are duplicate records present in the dataset. In this
dataset there were no duplicate records.
● Transaction_data.xlsx :
o The product_first_sold_date column is not in datetime format. The data type
of this column was changed from int64 to datetime format.
o There were 7 columns were Missing values were present. For such columns
based on the volumne of the missing values either the records were dropped or
appropiate values were imputed at places of missing values
o A new feature column 'Profit' was created which is basically the difference
between list price and standard price.
o There was no data inconsistency.
o Checked whether there are duplicate records present in the dataset. In this
dataset there were no duplicate records.
● CustomerAddress.xlsx :
o For states column there was no standardisation of data. Based on the values
available the column data was standardised to remove data inconsistency.
o There were certain customer IDs from Customer Dempgraphics table which
were getting dropped in the Address table.
● Recency (R): Who have purchased recently? Number of days since last purchase
(least recency)
● Frequency (F): Who has purchased frequently? It means the total number of
purchases. ( high frequency)
● Monetary Value(M): Who have high purchase amount? It means the total money
customer spent (high monetary value)
Apply weighted average method with scores for recency, frequency, and monetary is
determined by dividing them into 4 quartiles (min, 25%, 50%, 75% and max), and the
weightage is as follow: recency = 100, frequency = 10, and monetary = 1.
In this analysis the customer segment was divided into 11 groups. The groups being :
4. RFM Analysis