A Beginner's Guide To Customer Segmentation With Python - by Sigli Mumuni - Medium
A Beginner's Guide To Customer Segmentation With Python - by Sigli Mumuni - Medium
2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
Customer segmentation is the process of splitting your customer base into different
groups based on common characteristics. These characteristics are usually
demographic, like age, sex, and income, but psychographic or behavioral
characteristics like personality, interests, and habits are often considered as well.
Customer segmentation allows a business to deliver more targeted and effective
marketing that appeals to the different segments identified.
While customer segmentation has been around for as long as marketing itself, recent
advances in machine learning have made the process easier and more accurate. We can
il i l t t t ti i l t i
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3l i t f 1/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
easily implement customer segmentation using clustering analysis, a type of
Upgrade Open in app
unsupervised machine learning technique that places subjects in different groups (or
clusters) based on how closely associated they are with each other.
Customer ID
Gender
Age
You can download the dataset from the Kaggle website or my GitHub repository if you
want to follow along. All the relevant code used in this tutorial is also available in my
GitHub repository.
argument. If you have downloaded the file to your computer, then be sure to enter the
Upgrade Open in app
file path instead.
1 #Load the dataset
2 df = pd.read_csv("https://raw.githubusercontent.com/siglimumuni/Datasets/master/Mall_Customers.c
3
4 #View the first 5 rows
5 df.head()
We can get a glimpse of the dataset by using the head() method to display the first 5
rows of data. We can also use the info() method to get a quick breakdown of the
structure of the dataset including the number of rows and columns and data types of
all the columns as well as information on missing values.
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 3/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
We have a total of 200 rows of data and 5 columns, 4 of which are integers and 1 string
object. The dataset contains no null values.
Before we perform our cluster analysis, we will conduct an exploratory data analysis to
better understand the characteristics of the dataset and familiarize ourselves with the
relationships between the different variables.
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 4/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
The mean age is 38.85 and mean annual income is around $60,000 dollars. We can
explore these variables in more depth by visualizing their distributions with a
histogram. We can create multiple plots, side by side by using the plt.subplots() method
and then iterating through them with the histplot() method in seaborn.
There’s a wide range of different ages represented with most customers belonging to
the 20–40 year range. Also, the majority of customers are in the 60 to 80 thousand
dollars annual income bracket while most customers’ spending score is between 40 and
60.
Next, let’s check the proportion of males and females in the dataset. We can use the
countplot() method in seaborn to create a bar chart.
We have more female representation in the dataset than male. Finally, we can explore
the relationships between the different variables in the dataset. One great way to do
this is to use the corr() method to show the correlation between the different variables.
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 6/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
There doesn’t seem to be any correlation between the different variables except for Age
and Spending Score, which share a weak negative correlation.
This concludes our exploratory data analysis. We can now move on to our main task.
One of the key arguments we need to specify in a K-means clustering model is the
number of clusters. The optimal number of clusters will always vary from dataset to
dataset. Fortunately, there is tried and tested method to arrive at this number, through
a process known as the elbow method.
We begin by plotting the explained variation in the data as a function of the number of
clusters (called the Within Cluster Sum of Squared Errors or WCSS), and then pick out
the value at the elbow of the curve as the number of clusters to use.
1 #Create a subset of the dataframe with only Annual Income and Spending Score
2 X = df[["Annual Income (k$)","Spending Score (1-100)"]]
3
4 #Determine the variation in the data
5 wcss=[]
6 for i in range(1,11):
7 km=KMeans(n_clusters=i)
8 km.fit(X)
9 wcss.append(km.inertia_)
10
11 #Plot the elbow curve
12 plt.figure(figsize=(12,6))
13 plt.plot(range(1,11),wcss, linewidth=2, color="blue", marker ="8")
14 plt.xlabel("Number of Clusters (K)")
15 plt.xticks(np.arange(1,11,1))
16 plt.title("The Elbow Method")
17 plt.ylabel("WCSS")
18 plt.show()
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 7/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
Our challenge now is to determine the optimal value of K from the elbow diagram. The
trick is to identify the value at which the WCSS suddenly stops decreasing significantly
compared to previous decreases. In our case, we notice that the drop after 5 is
relatively minimal so we choose 5 as our optimal value. With this information, we can
now build our model.
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 8/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
And there is our updated dataframe with a label column specifying which segment a
given client belongs to. We can visualize the different segments using a scatterplot.
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 9/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
Now we’re able to see the clusters more clearly. Clients in Cluster 0 (Blue) have the
least income and least spending scores while clients in Cluster 2 (Green) have the most
income and highest spending scores.
As we did previously, we will begin by calculating the values of WCSS, but this time
with the Age column included.
1 #Create a subset of the dataframe with only Age, Annual Income and Spending Score
2 X2 = df[["Age","Annual Income (k$)","Spending Score (1-100)"]]
3
4 #Determine the variation in the data
5 wcss=[]
6 for i in range(1,11):
7 km=KMeans(n_clusters=i)
8 km.fit(X2)
9 wcss.append(km.inertia_)
10
11 #Plot the elbow curve
12 plt.figure(figsize=(12,6))
13 plt.plot(range(1,11),wcss, linewidth=2, color="blue", marker ="8")
14 plt.xlabel("Number of Clusters (K)")
15 plt.xticks(np.arange(1,11,1))
16 plt.title("The Elbow Method")
17 plt.ylabel("WCSS")
18 plt.show()
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 10/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
Again, we can select 5 as our optimal value of K. Let’s go ahead and build our second
model.
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 11/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
To see the individual clusters more clearly, we will need to visualize them using a
scatterplot, except this time we will need to create a 3D plot since we are dealing with 3
dimensions or variables.
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 12/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
We can also get a good idea of how the different segments differ by calculating the
average values of the three variables for each segment as well as a count of the number
of clients in each segment. This can be done with the groupby() method.
1 #Check the count and mean values of all three variables for the different segments
2 round(df.groupby(by="label")\
3 .agg({"CustomerID":"count","Age":"mean","Annual Income (k$)":"mean","Spending Score (1-1
4 .reset_index()\
5 .rename(columns={"label":"Segment","CustomerID":"No.of Clients"}))
The results provide a lot of interesting insights. For example, clients in Segment 0 are
the youngest, with a low income but high spending score while clients in Segment 4
are the oldest, with a low income and low spending score. Segment 2 has the largest
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 13/14
08.03.2022, 16:42 A Beginner’s Guide to Customer Segmentation with Python | by Sigli Mumuni | Medium
are the oldest, with a low income and low spending score. Segment 2 has the largest
Upgrade Open in app
number of clients with moderate incomes and moderate spending scores. We can
summarise the different segments as follows:
Using this information, we can go a step further by creating different personas for the
different segments. Then based on their unique characteristics, we can apply the
appropriate growth strategies which may include loyalty, referral, upselling, and
incentive programs among several others.
And with that, we come to the end of this tutorial. I hope that you learned something
new. If you have any questions or comments, please be sure to leave a note in the
comments section. Thank you very much for reading and all the best in your data
journey.
https://medium.com/@siglimumuni/a-beginners-guide-to-customer-segmentation-with-python-fc8c219d6fa3 14/14