0% found this document useful (0 votes)

42 views43 pages

Unsupervised Machine Learning (Customer Segmentation) Online Retail

Uploaded by

sonalrig

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views43 pages

Unsupervised Machine Learning (Customer Segmentation) Online Retail

Uploaded by

sonalrig

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

UNSUPERVISED MACHINE

LEARNING
(CUSTOMER SEGMENTATION)
ONLINE RETAIL
INTRODUCTION
1. The main goal is to identify customers that are most profitable and the ones who
churned out to prevent further loss of customer by redefining company policies.
2. CLUSTER ANALYSIS: Statistically Segment Customers into groups Observation by using the
features given below

Data Description
Attribute Data Type Description
Invoice Number Nominal 6-digit unique number / code starts with letter 'c', it indicates a cancellation
Stock Code Nominal a 5-digit unique number assigned to each distinct product.
Description Nominal Product (item) name
Quantity Numeric Quantities of each product (item) per transaction
Invoice Date Numeric Date and time when each transaction was generated
Unit Price Numeric Product price per unit in sterling.
CustomerID Nominal 5-digit unique number for Customer
Country Nominal the name of the country where each customer resides.
IMPORTING AND INSPECTING DATASET
Data set Name:- Online Retail
No of Observation:541908 (shape=8x541908)
dtypes: datetime=(1), float64=(2), int64=(1), object=(4) 1+2+1+4 = 8 columns

Data Cleaning
Checking Missing data
No use of this data it
1. CustomerID - 135080(25% Missing Values)
can be dropped
2. Description - 1454 (0.27% Missing Values)
Checking duplicates
Dropped
5268 data points were duplicated duplicates

Total data points left

No of Observation left :401604 (shape=8x 401604)
FEATURE ENGINEERING

Extracting year Date and Month from Invoice Date

Added Feature 'TotalAmount' by multiplying values from the Quantity

and UnitPrice column.(Sterling)

Added feature 'TimeType' based on hours to define whether its

Morning, Afternoon, or Evening

Dropping InvoiceNo starting with 'C’ that represents cancellation

MOST FREQUENT VALUES
MOST FREQUENT VALUES
MOST FREQUENT VALUES

Observations/Hypothesis
1. Most Customers are from the United Kingdom. A considerable number of customers are also from Germany, France, EIRE and
Spain.
2. There are no orders placed on Saturdays. Looks like it’s a non-working day for the retailer.
3. Most of the customers have purchased gifts in the month of November, October, December, and September. Less number of
customers have purchased the gifts in the month of April, January, and February.
4. Most of the customers have purchased the items in the Afternoon, moderate numbers of customers have purchased the
items in Morning and the least in the Evening.
5. WHITE HANGING HEART T-LIGHT HOLDER, REGENCY CAKESTAND 3 TIER, JUMBO BAG RED RETRO SPOT are the most ordered
products
LESS FREQUENT VALUES

Observations/Hypothesis
1. Saudi Arabia, Bahrain, the Czech Republic, Brazil, and Lithuania has the least number of customers

2. GREEN WIT METAL BAG CHARM, WHITE WITH METAL BAG CHARM, BLUE/NAT SELL NECLACE W PENDENT, PINK EASTER ENS
FLOWER, PAPER CRAFT LITTLE BIRDIE are some of the least sold products.
COUNTRY WISE ORDERS
COUNTRY WISE CUSTOMERS
COUNTRY WISE PURCHASE QUANTITY
PRODUCT WISE PURCHASE QUANTITY
PRODUCT WISE REVENUE
PRODUCT WISE CUSTOMERS
CUSTOMER WISE CANCELLATIONS
COUNTRY WISE CANCELLATIONS
VISUALIZING DISTRIBUTIONS

1. Visualizing the distribution of quantity, unitprice and total amount columns

2. It shows a positively skewed distribution because most of the values are clustered around the left side of the distribution
while the right tail of the distribution is longer, which means mean>median>mode

3. For symmetric graph mean=median=mode.

LOG TRANSFORMATION

1. After applying log transformation now the distribution plot looks comparatively better than being skewed.

2. We use log transformation when our original continuous data does not follow the bell curve, we can log transform this data
to make it as “normal” as possible so that the analysis results from this data become more valid.
RFM ANALYSIS

How frequently do
customers visit

Money By
Spent Customer
Recent visit by the
customers
RFM MODELLING
Customer Name Recency Frequency Monetary
Anthony 326 1
15 7183
RFM TABLE
Rahul 2 182 4310

Syed 75 31 1765

CONCLUSIONS:
Anthony

Anthony visited 326 days (approx. 1 year) ago and visited 15 times and spent Lost Potential
around 7183 Sterlings Customer
Rahul

Rahul visited 2 days ago and visited 182 times and spent around 4310 Sterlings
Recently visited
Potential Customer
Syed

Syed visited 75 days ago (2.5 months)and visited 31 times and spent around About to Lose
1765 Sterlings Average Customer
RFM MODELLING

1. Earlier the distributions of Recency, Frequency and Monetary columns were positively skewed but after applying log
transformation, the distributions appear to be symmetrical and normally distributed.

2. It will be more suitable to use the transformed features for better visualization of clusters.
RFM CORRELATION HEATMAP
1. We can see that Recency is highly correlated with the
RFM value.

2. Frequency and Monetary are moderately correlated

with the RFM.

Scaling for CLUSTERING Analysis

1. Log Transformation 2. Standard Scaler
of Features like on X variables, (0) Clustering Analysis
Followed by Modelling
Recency Frequency mean and (1) as
and Monetary standard deviation
Pipeline
EXTRACTING DATA DATA CLEANING DATA VISUALIZATION RFM ANALYSIS
Checking Missing data
Online Retail 1. 25 % of items RECENCY: Must be LESS
Observation:541908 (i.e 135080) FREQUENCY: Must be MORE
(shape=8x541908) 2. CustomerID – 1454
Checking duplicates MONETARY: Must be MORE
5268 data points were
Condition: For Best Customers
Duplicated
401604 DATA POINT LEFT

MODELLING CUSTOMER SEGMENTATION CONCLUSION

Binning (RFM SCORE)

Binning (RFM combination)
K-Means
Hierarchical
DBSCAN Clustering
BINNING RFM SCORES
BINNING RFM SCORES
RECENCY

FREQUENCY GROUP 1 LOST POOR CUSTOMERS

GROUP 2 AVERAGE CUSTOMERS

GROUP 3 GOOD CUSTOMERS

MONETARY GROUP 4 BEST CUSTOMERS

QUANTILE CUT
QUANTILE CUT
RECENCY

FREQUENCY GROUP 1 LOST POOR CUSTOMERS

GROUP 2 LOSING LOYAL CUSTOMERS

GROUP 3 GOOD CUSTOMERS

MONETARY GROUP 4 BEST CUSTOMERS

K-MEANS CLUSTERING

1. From the Elbow curve 5 appears to be at the elbow and hence can be considered as the number of clusters. n_clusters=4 or 6
can also be considered.

2. If we go by the maximum Silhouette Score as the criteria for selecting an optimal number of clusters, then n_clusters=2 can
be chosen.

3. If we look at both of the graphs at the same time to decide the optimal number of clusters, So 4 appears to be a good choice,
having a decent Silhouette score as well as near the elbow of the elbow curve.
K-MEANS | 2CLUSTER
K-MEANS | 2CLUSTER
RECENCY

GROUP 0 BEST CUSTOMERS

FREQUENCY GROUP 1 LOST POOR CUSTOMERS

MONETARY
K-MEANS | 5CLUSTER
K-MEANS | 5CLUSTER
RECENCY

FREQUENCY
GROUP 0 LOST POOR CUSTOMERS

GROUP 1 BEST CUSTOMERS

RECENTLY VISITED AVERAGE

GROUP 2 CUSTOMERS
MONETARY
GROUP 3 LOSING LOYAL CUSTOMERS

GROUP 4 AVERAGE CUSTOMERS

K-MEANS | 4CLUSTER
K-MEANS | 4CLUSTER
RECENCY

FREQUENCY GROUP 0 LOSING LOYAL CUSTOMERS

GROUP 1 BEST CUSTOMERS

GROUP 2 LOST POOR CUSTOMERS

RECENTLY VISITED AVERAGE

MONETARY GROUP 3 CUSTOMERS
HIERARCHICAL CLUSTERING
In the K-means clustering there is a challenge to
predetermine the number of clusters, and it always tries
to create the clusters of the same size. To solve these two
challenges, we can opt for the hierarchical clustering
algorithm because, in this algorithm, we don't need to
have knowledge about the predefined number of clusters.
Hierarchical clustering is based on two techniques:

a. Agglomerative: Agglomerative is a bottom-up approach, in which the

algorithm starts with taking all data points as single clusters and merging
them until one cluster is left.

b. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm

as it is a top-down approach.

We have defined the optimal number of clusters based on dendrogram as

shown here
HIERARCHICAL | 2CLUSTER
HIERARCHICAL | 2CLUSTER
RECENCY

GROUP 0 AVERAGE CUSTOMERS

FREQUENCY GROUP 1 BEST CUSTOMERS

MONETARY
HIERARCHICAL | 3CLUSTER
HIERARCHICAL | 3CLUSTER
RECENCY

GROUP 0 BEST CUSTOMERS

FREQUENCY
GROUP 1 LOSING LOYAL CUSTOMERS

GROUP 2 LOST POOR CUSTOMERS

MONETARY
DBSCAN
DBSCAN
RECENCY

FREQUENCY GROUP -1 AVERAGE CUSTOMERS

GROUP 0 LOST POOR CUSTOMERS

GROUP 1 GOOD CUSTOMERS

MONETARY GROUP 2 LOSING LOYAL CUSTOMERS

SUMMARY

▪ We started with a simple binning and quantile based simple segmentation model first then moved to more complex
models because simple implementation helps having a first glance at the data and know where/how to exploit it better.
▪ Then we moved to k-means clustering and visualized the results with different number of clusters. As we know there is
no assurance that k-means will lead to the global best solution. We moved forward and tried Hierarchical Clustering
and DBSCAN clusterer as well.
▪ We created several useful clusters of customers on the basis of different metrics and methods to cateorize the
customers on the basis of their behavioral attributes to define their valuability, loyalty, profitability etc. for the business.
Though significantly separated clusters are not visible in the plots, but the clusters obtained is fairly valid and useful as
per the algorithms and the statistics extracted from the data.
▪ Segments depends on how the business plans to use the results, and the level of granularity they want to see in the
clusters. Keeping these points in view we clustered the major segments based on our understanding as per different
criteria as shown in the summary dataframe.
FINAL CONCLUSION
CUSTOMER SEGMENTS OBTAINED FROM CLUSTERING ANALYSIS

Marketing & Retail Analytics-Milestone 1 - 300521
71% (14)
Marketing & Retail Analytics-Milestone 1 - 300521
18 pages
Stopwatch 5 Answer Key Standard Test
No ratings yet
Stopwatch 5 Answer Key Standard Test
4 pages
Advanced Portfolio Management: A Quant's Guide for Fundamental Investors
From Everand
Advanced Portfolio Management: A Quant's Guide for Fundamental Investors
Giuseppe A. Paleologo
No ratings yet
Customer Segmentation Using RFM Analysis: Overview
No ratings yet
Customer Segmentation Using RFM Analysis: Overview
11 pages
Suwarti - Final Project
No ratings yet
Suwarti - Final Project
20 pages
PDF Custome Segmentation
No ratings yet
PDF Custome Segmentation
18 pages
Lab 2 0826422 CTP Project - 2
No ratings yet
Lab 2 0826422 CTP Project - 2
15 pages
DAB 303 Project 2
No ratings yet
DAB 303 Project 2
12 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
Data Insights - Module 2 (Sanskar)
No ratings yet
Data Insights - Module 2 (Sanskar)
19 pages
Universitas Yapis Papua (2024) - Segmentasi Pelanggan Dan RFM
No ratings yet
Universitas Yapis Papua (2024) - Segmentasi Pelanggan Dan RFM
12 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
CRM Analytics - RFM Model (New)
No ratings yet
CRM Analytics - RFM Model (New)
13 pages
RSM8522-2024 2 Market Segmentation and Targeting Via RFM
No ratings yet
RSM8522-2024 2 Market Segmentation and Targeting Via RFM
54 pages
Tutorial Workshop 6 v2
No ratings yet
Tutorial Workshop 6 v2
6 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
1 s2.0 S1319157819309802 Main
No ratings yet
1 s2.0 S1319157819309802 Main
8 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
Gaurav Upadhyay ML Project
No ratings yet
Gaurav Upadhyay ML Project
8 pages
Customer 360
No ratings yet
Customer 360
14 pages
MRA Project - Shehroz Khan
67% (3)
MRA Project - Shehroz Khan
19 pages
Optimizing Promotion Strategies With Business Intelligence, Customer Segmentation, and Market Basket Analysis
No ratings yet
Optimizing Promotion Strategies With Business Intelligence, Customer Segmentation, and Market Basket Analysis
37 pages
Customer Segmentation Using Machine Learning Model
No ratings yet
Customer Segmentation Using Machine Learning Model
12 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
7 pages
Customer Segmentation Project
No ratings yet
Customer Segmentation Project
13 pages
Automobile Report
No ratings yet
Automobile Report
31 pages
DWDM Report
No ratings yet
DWDM Report
6 pages
MRA MS Week 1
No ratings yet
MRA MS Week 1
11 pages
MRA MS Week 1
No ratings yet
MRA MS Week 1
11 pages
BMT (6148) - Marketing Metrics: Digital Assignment-2
No ratings yet
BMT (6148) - Marketing Metrics: Digital Assignment-2
12 pages
Irjet V11i5300
No ratings yet
Irjet V11i5300
5 pages
Article Segmentation Clients
No ratings yet
Article Segmentation Clients
6 pages
Mark Ana
No ratings yet
Mark Ana
7 pages
Customer Analytics Retail Project
No ratings yet
Customer Analytics Retail Project
8 pages
IJCRT2212570
No ratings yet
IJCRT2212570
4 pages
Customer Segmentation With RFM Models and Demographic Variable Using DBSCAN Algorithm
No ratings yet
Customer Segmentation With RFM Models and Demographic Variable Using DBSCAN Algorithm
8 pages
Week 10
No ratings yet
Week 10
18 pages
RFM Model For Customer Purchase Behaviour Using K-Means Algorithm
No ratings yet
RFM Model For Customer Purchase Behaviour Using K-Means Algorithm
55 pages
15
No ratings yet
15
6 pages
Customer Segmentation Based On RFM Model and Clustering Techniques With K-Means Algorithm
No ratings yet
Customer Segmentation Based On RFM Model and Clustering Techniques With K-Means Algorithm
6 pages
FULLTEXT02
No ratings yet
FULLTEXT02
87 pages
Adm Final
No ratings yet
Adm Final
7 pages
RFM
100% (1)
RFM
27 pages
Clustering - Case Study 4
No ratings yet
Clustering - Case Study 4
27 pages
RFM Analysis Using Python Spark
No ratings yet
RFM Analysis Using Python Spark
3 pages
RFM 1.1
No ratings yet
RFM 1.1
10 pages
A Study On Customer Segmentation Using Recency Frequency and Monetary Analysis On Jivanjor Adhesive Product at Banglore
No ratings yet
A Study On Customer Segmentation Using Recency Frequency and Monetary Analysis On Jivanjor Adhesive Product at Banglore
9 pages
RFM How To Automatically Segment Customers Using Purchase Data and A Few Lines of Python
No ratings yet
RFM How To Automatically Segment Customers Using Purchase Data and A Few Lines of Python
8 pages
SUKANYA 2nd JUNE 2024 RFM ANALYSIS PROBLEM STATEMENT1
No ratings yet
SUKANYA 2nd JUNE 2024 RFM ANALYSIS PROBLEM STATEMENT1
33 pages
IEEE Conference Template 5
No ratings yet
IEEE Conference Template 5
5 pages
Mini Project II Instructions Segmentation and Regression
0% (1)
Mini Project II Instructions Segmentation and Regression
6 pages
Customer Segmentation Using Data Science
No ratings yet
Customer Segmentation Using Data Science
7 pages
MRA Milestone 1 RFM
No ratings yet
MRA Milestone 1 RFM
28 pages
Work Book 9 Recency Frequency Monetary Analysis: Activity 1 RFM Exercise in Excel
No ratings yet
Work Book 9 Recency Frequency Monetary Analysis: Activity 1 RFM Exercise in Excel
5 pages
ML Assignment 1
No ratings yet
ML Assignment 1
23 pages
RFM Assignment
No ratings yet
RFM Assignment
2 pages
RFM Analysis in R: Math 3201 Datamining Foundation
No ratings yet
RFM Analysis in R: Math 3201 Datamining Foundation
12 pages
K-Means Clustering Interpretation Using Recency, Frequency, and Monetary Factor For Retail Customers Segmentation
No ratings yet
K-Means Clustering Interpretation Using Recency, Frequency, and Monetary Factor For Retail Customers Segmentation
12 pages
WQD7005 Case Study - 17219402
No ratings yet
WQD7005 Case Study - 17219402
21 pages
GiaoHoThanh - RFM and CLV Paper - V2
No ratings yet
GiaoHoThanh - RFM and CLV Paper - V2
16 pages
Finding Alphas: A Quantitative Approach to Building Trading Strategies
From Everand
Finding Alphas: A Quantitative Approach to Building Trading Strategies
Igor Tulchinsky
4/5 (1)
Designing Tkinter Forms With Page (CAS)
No ratings yet
Designing Tkinter Forms With Page (CAS)
16 pages
The Importance of Frequent Turning of The Bedridden
No ratings yet
The Importance of Frequent Turning of The Bedridden
22 pages
Summary of Demand, Supply, and Market Equilibrium
No ratings yet
Summary of Demand, Supply, and Market Equilibrium
3 pages
Mitel 470 EN
No ratings yet
Mitel 470 EN
276 pages
Settlement Geography Key Concepts Portia (1) - 1
No ratings yet
Settlement Geography Key Concepts Portia (1) - 1
3 pages
Itc Information Theory and Coding Dec 2020
No ratings yet
Itc Information Theory and Coding Dec 2020
2 pages
2019-07 Machine-Details EN Mail
No ratings yet
2019-07 Machine-Details EN Mail
63 pages
Mitsubishi Vla Cleaning
No ratings yet
Mitsubishi Vla Cleaning
3 pages
Bank of India - Recruitment of General Banking Officers Online Application Form For The Post of General Banking Officers
No ratings yet
Bank of India - Recruitment of General Banking Officers Online Application Form For The Post of General Banking Officers
3 pages
3 - TUTORIAL TOPIC Co Secretary LATEST
No ratings yet
3 - TUTORIAL TOPIC Co Secretary LATEST
4 pages
PBL Report
No ratings yet
PBL Report
10 pages
OTDR Yokogawa AQ7275
No ratings yet
OTDR Yokogawa AQ7275
8 pages
PVAAS Data Tools
No ratings yet
PVAAS Data Tools
4 pages
Economic Update 17th May-1
No ratings yet
Economic Update 17th May-1
2 pages
Michael Moore's 13 Rules For Making Documentary Films
No ratings yet
Michael Moore's 13 Rules For Making Documentary Films
16 pages
TWS 8
No ratings yet
TWS 8
7 pages
Cladding System
100% (1)
Cladding System
44 pages
GEH-6403k 1
No ratings yet
GEH-6403k 1
5 pages
Invoices Template
No ratings yet
Invoices Template
4 pages
Quality Assurance.: A Presentation On
100% (1)
Quality Assurance.: A Presentation On
22 pages
21-30-26-Rev 1 Protection Relay Settings
100% (1)
21-30-26-Rev 1 Protection Relay Settings
18 pages
Structural Mechanical and Electronic Properties of
No ratings yet
Structural Mechanical and Electronic Properties of
6 pages
Condo Terminators Completes Florida's First Mass Short Sale: - Comtex
No ratings yet
Condo Terminators Completes Florida's First Mass Short Sale: - Comtex
2 pages
We Love English 2
No ratings yet
We Love English 2
5 pages
Ambus Net
No ratings yet
Ambus Net
4 pages
Business Process Models Ais
No ratings yet
Business Process Models Ais
18 pages
Bill Gates-WPS Office-1
No ratings yet
Bill Gates-WPS Office-1
13 pages
Elkhart Community Schools Students Compete in 2015 ISSMA Solo & Ensemble Contest
No ratings yet
Elkhart Community Schools Students Compete in 2015 ISSMA Solo & Ensemble Contest
6 pages

Unsupervised Machine Learning (Customer Segmentation) Online Retail

Uploaded by

Unsupervised Machine Learning (Customer Segmentation) Online Retail

Uploaded by

UNSUPERVISED MACHINE

Total data points left

Extracting year Date and Month from Invoice Date

Added Feature 'TotalAmount' by multiplying values from the Quantity

Added feature 'TimeType' based on hours to define whether its

Dropping InvoiceNo starting with 'C’ that represents cancellation

1. Visualizing the distribution of quantity, unitprice and total amount columns

3. For symmetric graph mean=median=mode.

2. Frequency and Monetary are moderately correlated

Scaling for CLUSTERING Analysis

MODELLING CUSTOMER SEGMENTATION CONCLUSION

Binning (RFM SCORE)

FREQUENCY GROUP 1 LOST POOR CUSTOMERS

GROUP 2 AVERAGE CUSTOMERS

GROUP 3 GOOD CUSTOMERS

MONETARY GROUP 4 BEST CUSTOMERS

FREQUENCY GROUP 1 LOST POOR CUSTOMERS

GROUP 2 LOSING LOYAL CUSTOMERS

GROUP 3 GOOD CUSTOMERS

MONETARY GROUP 4 BEST CUSTOMERS

GROUP 0 BEST CUSTOMERS

FREQUENCY GROUP 1 LOST POOR CUSTOMERS

GROUP 1 BEST CUSTOMERS

RECENTLY VISITED AVERAGE

GROUP 4 AVERAGE CUSTOMERS

FREQUENCY GROUP 0 LOSING LOYAL CUSTOMERS

GROUP 1 BEST CUSTOMERS

GROUP 2 LOST POOR CUSTOMERS

RECENTLY VISITED AVERAGE

a. Agglomerative: Agglomerative is a bottom-up approach, in which the

b. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm

We have defined the optimal number of clusters based on dendrogram as

GROUP 0 AVERAGE CUSTOMERS

FREQUENCY GROUP 1 BEST CUSTOMERS

GROUP 0 BEST CUSTOMERS

GROUP 2 LOST POOR CUSTOMERS

FREQUENCY GROUP -1 AVERAGE CUSTOMERS

GROUP 0 LOST POOR CUSTOMERS

GROUP 1 GOOD CUSTOMERS

MONETARY GROUP 2 LOSING LOYAL CUSTOMERS

You might also like