0% found this document useful (0 votes)
56 views

Customer Segmentation Using Data Science

Uploaded by

Hritika Shahu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Customer Segmentation Using Data Science

Uploaded by

Hritika Shahu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

CUSTOMER SEGMENTATION USING DATA SCIENCE

Abstract: Customers have always been and always will We know good reviews and better quality products
be the center of the market, the most important aspect make customers turn twice towards any business
of it. As time progresses on the diverse nature of and as word spreads new customers can flock into
customers and how they affect the market and product analyse all this , have them classified in such
schemes as whole is brought forward; especially
manners based on circumstance-and-need based
through heaps of data available these days. Most
customers prefer online shopping especially after the
clusters can help a businessperson to handle their
pandemic which has help increase the data about the reach better. As we know the market keeps
customers, their interests, characteristics and has hence changing overnight, to be in front, analysing is the
helped companies to understand their needs better. One most important factor and customer segmentation
of the ways to better provide for customers, is using helps in achieving it. Customer segmentation refers
customer segmentation. Customer segmentation is the to the process of determining how to interact with
technique through which we can form clusters of consumers in various groups to amplify the value
customers based on different aspects from their already of each customer to the company. Marketers can
collected data, this could be based on gender, region,
use customer segmentation to reach out to each
age etc. Practicing customer segmentation helps the
company understand their potential audience better consumer in the most efficient way feasible [2-3].
which in turns helps them provide better marketing Through these groups customers’ behavioural,
schemes targeting special zones which can help boost demographic etc patterns can be recognised which
their product growth. Here we have implemented can further help improve business a s it could pave
customer segmentation keeping in mind the above- the way for bringing in more profit, at the end any
mentioned things, and tried to form a system that can business’ main goal is to bring as much profit as
help anticipate the products that the customer might be possible. This could also help attract new potential
willing to buy using K-means clustering with elbow
customers, for this a great marketing strategy is
method. To implement the system, we have used K-
required and that strategy could very well be
means method via Python language with the help of
Machine Learning and Data Science approaches. The designed for a specific group of customers based on
data set used here provides real-time data about their characteristics cluster. Clustering has been
products brough along with other important proved to be a good way to implement consumer
information. segmentation. Clustering, which is the capacity to
uncover categories in unlabelled datasets, is an
example of unsupervised learning. K-means,
Keywords: Customer segmentation, K-means , hierarchical clustering, DBSCAN clustering, and
Euclidean distance , Classification, Data science, other approaches are among the clustering methods
machine learning available.[4]. Through this project we will develop
a model that will help us classify customers from a
dataset of user-purchases into segments. By further
working on these segments, we will anticipate the
I. INTRODUCTION
purchases that will be made by a new customer,
Customers are people who purchase and use the
during the following year.
firm's goods. Customers have wants, and they are
the ones who ultimately decide whether the
company's product meets those demands.
Customers represent the company's market share,
and sales and profits are generated by them.
Customer loyalty to a product or service is II. LITERARY SURVEY
advantageous since customers will continue to look In [5], For assessing customer classification for
for the thing they want.[1] In order to make a campaign management, the customer life time
business better and more-flourishing , knowing value (LTV) model and the recency frequency and
your customer is the most important. It could range monetary (RFM) model are suggested. A general
from knowing the gender of your customer to technique is also provided for locating more
knowing which shift in marketing scheme got your suitable customers for each marketing approach.
product the most attention from the customers. Targeting and segmenting customers is the
Knowing your customer is important but being able marketing plan. Nissan car retailer dataset of more
to attract more customers is even more important.
than 4000 clients is divided for assessment of the intra cluster distance when the effectiveness of both
suggested method's effectiveness. When it comes to methods was evaluated and contrasted.
client targeting, the suggested model assessment is
more effective than the random process of In [10], To segment the clients in the private
choosing. banking industry, neural networks and support
vector machines are suggested as machine learning
In [6], On e-commerce websites, the RFM, kano, methods. Customers are divided into groups based
and BG/NBD models are suggested for consumer on factors such as character identification, credit
segmentation. Based on factors like customer default, fraud forecast, and outlook for the foreign
lifetime worth, customer satisfaction, and customer exchange market. Customer segmentation is the
engagement, customers are divided into various first stage in the private banking industry to
categories. The customers are divided into ten achieving the most lucrative company growth.
groups in accordance with the marketing strategies.
By categorizing and targeting customers, In [11], From the viewpoints of network operators,
segmentation increases earnings for companies. handset makers, and application writers, authors
showed established user segmentation on pertinent
In [7], An RFM paradigm was suggested. To create metrics. To analyze the results of a smartphone
segment-level models that can be predicted, a measuring study with 129 subjects, they used latent
chosen data model is submitted to and implemented class analysis. The information is then connected to
with pattern-based clustering and signature finding psychographic and socioeconomic information to
methods. In this instance, credit card consumption support behaviors. Different service groups can be
data is used, and the methods are applied to identified in terms of network traffic (phone, SMS,
produce a financial matrix and a fluctuate-rate and internet), as well as the use of content services.
matrix that assist in the investigation of various (i.e. Applications and URLs).
modes. Using the clustering on the two vectors, we
examine various customer traits. A two- In [12], The conventional K-means clustering
dimensional customer segmentation model based method is thoroughly examined in this article, and
on consumption is created with the aid of these a modelling procedure based on the least squares
factors. idea is presented for telco consumer segmentation.
A clustering technology based on the K-means
In [8], The grocery store business uses the new method is being provided for Changzhou Telecom
RFM model LRFMP, which means for Length, in Jiangsu Province, and real results demonstrate
Recency, Frequency, Monetary, and Periodicity, to that it offers an effective and successful resolve of
categorize customers and find various client customer segmentation for Telecom, bringing
segments. Customers are segmented using real- services closer to the customers.
world data from a Turkish grocery chain and a
combination of the LRFMP model and clustering In [13],A Decision Tree Analysis method is given
method. This research paves a simple path for in Full-Service Restaurants to classify patrons
researchers and practitioners to gather useful according to their dining tastes. When making
insights to identify various customer profiles based purchases, consumers are differentiated based on
on the LRFMP model and primarily assists the five factors, including the menu, ambiance, pricing,
decision makers to obtain various methods to health, and brand. 390 surveys in total were used to
develop useful customer relationships and gather the data. When it comes to decision tree
distinctive marketing strategies to reach wider analysis, a researcher can target and locate suitable
customers. The grocery store business is the consumers. Five customer segments are divided
example. based on a series of criteria. The administration and
advertisers of restaurants can benefit from customer
In [9], The network of online banking customers segmentation.
has expanded rapidly in recent years, and clustering
algorithms-based consumer segmentation of In [14], Maintaining customer loyalty and attention
unstructured transactional data is urgently needed. span is presently one of the biggest challenges
On datasets based on the RFM score of a confronting the retail industry. The methods used in
customer's online banking activities, the most marketing are constantly evolving. The variables
popular clustering methods, K-means and K- that have the biggest effects on historical
medoids, are used. The K-means strategy correlation are determined by sales information
outperformed the K-Medoids method based on acquired through a transaction. Based on groups,
suitable resources can be allocated. using machine
learning to route traffic to algorithms for happy help us while forming clusters further in the
consumers. Singular Value Decomposition is used project.
for offering, and K-Means clustering is used for
client classification.
D. Creating clusters:

Here we will use K-means algorithm from


III. PROPOSED SYSTEM scikit-learn, to define clusters of customers.
Here we have formed 11 clusters of customers.
The proposed system for customer Here we have used difference attributes like
segmentation can be seen in the following clusters of customers with their amount spent
figure: to get a better visualization of our clusters.

E. Classification:

Here, we will adjust a classifier that will


classify consumers in different client
categories that we have decided previously.
We will use different classifiers here:

i. Support Vector Machine Classifier


(SVC)

SVC helps in classification by taking data


points and transforming them into a high-
dimensional space. It then searches for the
best hyperplane that can effectively split
the data into two distinct classes.

ii. Confusion Matrix

A confusion matrix displays the results


Fig1 The flow of proposed system and predictions of a classification problem
in a tabular format, aiding in the
A. Defining Objective:
visualization of the outcomes. It depicts a
Dividing customers into different groups can table that shows the predicted and actual
provide various opportunities for generating values of a classifier.
more revenue. This approach can enhance
iii. Learning Curve
multiple areas, such as budget planning,
product development, advertising, promotion, Learning curves can be used to detect
marketing strategies, and customer satisfaction. possible drawbacks in the model like over-
fitting or under-fitting.
B. Data Preparation:
Therefore, we can use the above mentioned
A real-time dataset was collected from Kaggle.
methods for classifying the data. Now, to
It has 137839 entries about user-purchases.
predict the products we will use different
algorithms, with this algorithm we will use a
learning curve to know their score
C. Exploratory Data Analysis: corresponding to their training examples. This
way we can find which algorithm is best for
Data from the dataset used first needs to be
our model.
pre-processed and analysed for it to be ready
for us to function on. In this step, we will 1) Logistic Regression
perform various steps like removing null
entries, duplicate entries. We will also explore When the model is fitted with our training
the data by analysing the attributes like data using logistic data, the precision is
countries, customers and products , this will 86.84%
Fig5 Learning curve for Random Forest
Fig2 Learning curve for Logistic Regression
5) AdaBoost Classifier
2) K Nearest Neighbor
When the model is fitter using AdaBoost
When the model is fitted with our data classifier the pression is 55.96%
using KNN, the precision is 81.72%

Fig3 Learning curve for KNN Fig6 Learning curve for AdaBoost Classifier

3) Decision Tree 6) Gradient Boosting

When the model is fitted with the training When the model is fitted using gradient
data using Decision Tree, the precision boosting classifier the precision is 89.47%
comes out to be 85.73%

Fig4 Learning curve for decision tree Fig7 Learning curve for Gradient Boosting

4) Random Forest

When the model is fitted using random F. Testing Predictions:


forest, the obtained precision is 89.34%
Several classifiers were trained in the
preceding section to classify customers, using
data from the first 10 months. This section
involves evaluating the model by testing it
against the last two months of the dataset.

IV. RESULTS AND ANALYSIS

A. Loading Dataset

The dataset consists of 137839 entries about


user data with their country, purchases, amount
spent etc.

Fig8 The dataset

B. Data Preprocessing
Fig11 Distribution based on the frequency of words
The data was cleaned by eliminating null occurring in the product description
values from all columns and removing By distributions like above mentioned ones,
unnecessary features. Additionally, any we have formed 11 clusters of products.
missing values were replaced with the mean
value of the respective term. D. Testing Predictions

The task at hand involves utilizing data from a


two-month period to assign customers to
specific categories, which can then be used to
Fig9 The dataset after cleaning evaluate the classifier's predictions. To assign
customers to categories, the k means method
C. Distribution of Data used in section 4 is employed. The predict
method of this instance computes the distance
Distribution of the preprocessed data is done in
between customers and the centroids of the 11
different forms, as shown below.
different customer classes, and the category
with the smallest distance is chosen to assign
customers to.

Fig10 Distribution based on order amounts by


customers

Fig12 The results


As previously suggested in the revised detection with machine learning algorithm and
approach, the quality of the classifier can be image processing,” Proc. - 5th Int. Conf. Intell.
enhanced by combining the predictions from Comput. Control Syst.ICICCS 2021, no.
each classifier. In this regard, I opted to merge Iciccs, pp. 755–760, 2021, doi:
the predictions from Random Forest, Gradient 10.1109/ICICCS51141.2021.9432274.
Boosting, and k-Nearest Neighbors, which
resulted in a slight improvement in the overall [4]Swesh Raj Regmi ,Jasraj Meena,Utkarsh
predictions. Kanojia ,Vishesh Kant, "Customer Market
Segmentation using Machine Learning
Hence the precision comes out to be 76.83%. Algorithm “Proceedings of the Sixth
International Conference on Trends in
Electronics and Informatics (ICOEI 2022)
IEEE Xplore Part Number: CFP22J32-ART;
V. CONCLUSION
ISBN: 978-1-6654-8328-5
in this project we had developed a customer
[5] Radhika, V., Prasad, C. R., & Chakradhar,
segmentation using several data science and
A. (2022, January). Smartphone-Based Human
machine learning techniques like k-means,
Activities Recognition System using Random
making use of different classifiers like SVC,
Forest Algorithm. In 2022 International
learning curve etc and lastly methods like
Conference for Advancement in Technology
logistic regression, knn etc. to find the best
(ICONAT) (pp. 1-4). IEEE.
quality classifier by combining predictions
from each classifier. This project dealt with the [6] Chan, Chu Chai Henry. "Intelligent value-
real-time dataset that has 137839 entries about based customer segmentation method for
customers from different geographical, age- campaign management: A case study of
wise, gender-wise groups and their information automobile retailer." Expert systems with
about their purchased products, along with the applications 34.4 (2008): 2754-2762.
product description. This data and our system
helped us anticipate the products that can be [3] He, Xixi; Li, Chen (2016). [IEEE 2016 6th
purchased by the customers already in our International Conference on Digital Home
system aur even products that can be (ICDH) - Guangzhou, China (2016.12.2-
purchased by new customers. This can be used 2016.12.4)] 2016 6th International Conference
eagerly by the companies to improve their on Digital Home (ICDH) - The Research and
selling efficiency and also help them better Application of Customer Segmentation on E-
us=utilize inventory. This could also help Commerce Websites. , (), 203–208.
companies to market some products better and doi:10.1109/ICDH.2016.050
within the right audience.
[4] Wu, Jing; Lin, Zheng (2005). [ACM Press
the 7th international conference - Xi'an, China
(2005.08.15- 2005.08.17)] Proceedings of the
VI. REFERENCES 7th international conference on electronic
commerce - ICEC '05 -Research on customer
[1]R. Gupta and C. Pathak, “A machine segmentation model by clustering.,
learning framework for predicting purchase by 316–.doi:10.1145/1089551.1089610
online customers based on dynamic pricing,”
Procedia Comput. Sci., vol. 36, no. C, pp. 599– [5] Peker, Serhat, Altan Kocyigit, and P. Erhan
605, 2014, doi: 10.1016/j.procs.2014.09.060. Eren. "LRFMP model for customer
segmentation in the grocery retail industry: a
[2]V. Kedia, S. R. Regmi, K. Jha, A. Bhatia, S. case study." Marketing Intelligence &
Dugar, and B. K. Shah, “Time Efficient IOS Planning (2017).
Application for CardioVascular Disease
Prediction Using Machine Learning,” Proc. - [6] Aryuni, Mediana; Didik Madyatmadja,
5th Int. Conf. Comput. Methodol. Commun. Evaristus; Miranda, Eka (2018). [IEEE 2018
ICCMC 2021, no. Iccmc, pp. 869–874, 2021, International Conference on Information
doi: 10.1109/ICCMC51019.2021.9418453. Management and Technology (ICIMTech) -
DKI Jakarta, Indonesia (2018.9.3-2018.9.5)]
[3] A. Bhatia, V. Kedia, A. Shroff, M. Kumar, 2018 International Conference on Information
B. K. Shah, and Aryan, “Fake currency Management and Technology (ICIMTech) -
Customer Segmentation in XYZ Bank Using
K-Means and K-Medoids Clustering.

[7] Smeureanu, Ion, Gheorghe Ruxanda, and


Laura Maria Badea."Customer segmentation in
private banking sector using machine learning
techniques." Journal of Business Economics
and Management 14.5 (2013): 923- 939.

[8] Hamka, Fadly, et al. "Mobile customer


segmentation based on smartphone
measurement." Telematics and Informatics
31.2 (2014): 220-227.

[9] Cai Qiuru, ; Luo Ye, ; Xi Haixu, ; Liu


Yijun, ; Zhu Guangping, (2012). [IEEE 2012
International Conference on Computer Science
and Information Processing (CSIP) - Xian,
Shaanxi, China (2012.08.24-

[10] Hwang, Jinsoo, et al. "Customer


segmentation based on dining preferences in
full-service restaurants." Journal of
FoodserviceBusiness Research 15.3 (2012):
226-246.

[11] Bhade, Kalyani, et al. "A Systematic


Approach to Customer Segmentation and
Buyer Targeting for Profit Maximization."
2018 9th International Conference on
Computing, Communication and Networking
Technologies (ICCCNT). IEEE, 2018.

You might also like