0% found this document useful (0 votes)
69 views5 pages

IEEE Conference Template 5

This document proposes an improved machine learning-based customer segmentation framework for e-commerce sites. The framework has five components: cluster tendency test, building a clustering model, comparing K-means and hierarchical clustering, comparing K-means and DBSCAN, and providing insights and recommendations. The aim is to identify different customer types like best, potential loyal, and at-risk customers to help e-commerce sites target customer groups that will provide more revenue and apply new marketing strategies to promote growth. Clustering algorithms like K-means, hierarchical clustering, and DBSCAN are evaluated on a customer dataset to segment customers for personalized recommendations.

Uploaded by

ishrat jahan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views5 pages

IEEE Conference Template 5

This document proposes an improved machine learning-based customer segmentation framework for e-commerce sites. The framework has five components: cluster tendency test, building a clustering model, comparing K-means and hierarchical clustering, comparing K-means and DBSCAN, and providing insights and recommendations. The aim is to identify different customer types like best, potential loyal, and at-risk customers to help e-commerce sites target customer groups that will provide more revenue and apply new marketing strategies to promote growth. Clustering algorithms like K-means, hierarchical clustering, and DBSCAN are evaluated on a customer dataset to segment customers for personalized recommendations.

Uploaded by

ishrat jahan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

An Improved Machine Learning Based Customer

Segmentation for Insight and Recommendation in


E-commerce
1st Given Name Surname 2nd Given Name Surname 3rd Given Name Surname
dept. name of organization (of Aff.) dept. name of organization (of Aff.) dept. name of organization (of Aff.)
name of organization (of Aff.) name of organization (of Aff.) name of organization (of Aff.)
City, Country City, Country City, Country
email address or ORCID email address or ORCID email address or ORCID

Abstract—Customer segmentation has developed into one of [3]. For an e-commerce site, selecting the appropriate machine
the most important and practical strategies for e-marketing or learning algorithm can effectively extract the effective features
e-commerce in recent years. It is essential to the online system for of customer behaviors and use these features to realize the
product recommendations and aids in the understanding of the
local and international wholesale and retail markets. Customer division of different customer groups. In order to obtain
segmentation is the process of categorizing customers based on differentiated interpretation of various clusters, cluster analysis
shared traits like gender, age, location, ratings, and so forth. is a type of method frequently employed in machine learning.
Customer segmentation, the process of grouping like-minded It is mostly utilized in the analysis of enterprise data to notice
customers into the same segment, is aided by the clustering the distribution characteristics present in large datasets (Oña
algorithm. The most commonly used clustering methods are K-
means, density-based, and hierarchical clustering. et al. 2016).
In this paper, an improved customer segmentation frame- The aim of this paper is to determine the types of customers,
work has been developed using the best clustering model for such as best customers, potential loyal customers, and at-risk
insight and recommendation . There are five components in customers, and to determine the value of customers so that e-
the framework, including the Cluster Tendency Test, building a commerce sites can select and decide which types of customer
clustering model, comparison between K means and Hierarchical
Clustering, comparison between K means and DBSCAN, insight
will provide robust revenue and which won’t, and also what
and recommendation. new market strategy they can apply to promote their growth
Index Terms—Customer segmentation, Hierarchical clustering in revenue.
, Clustering algorithms, Density based clustering, Distortion
score. II. R ELATED W ORK
The customer segmentation approach was employed by
I. I NTRODUCTION several researchers in a variety of fields.A methodology for
customer segmentation based on clusters was investigated.
The emergence of the information age and the quick ad-
It was suggested to use the hierarchical clustering method,
vancement of computer network technology have fundamen-
named HACNJ, which is based on Q-criterion [4]. The online
tally altered the nature of market competitiveness. In addition
store may recognize customers by using the data mining
to being quicker and more convenient in terms of time and
method. Customers can therefore receive personalized services
space, the internet-based business model also, to a significant
in the right marketing strategies based on their requirements
extent, offers effective performance for businesses to gather
[5]. In order to efficiently analyze customer characteristics, a
customer resources and market knowledge [1]. .E-commerce
method was proposed in which a retail supermarket was used
websites provide a very beneficial platform for the online
as the research object. Data mining techniques were then used
buying and selling of a variety of goods from various locations.
to identify retail enterprise customer segments, and association
E-commerce websites have a specific marketing objective:
rules obtained using the Apriori algorithm were applied to
to engage with customers.Every customer is unique, that
various customer groups [6].
comes up is the overloading information because of many
products offered by e-commerce [2]. In order to overcome the III. BACKGROUND OF THE S TUDY
overloaded information problems, customer segmentation is The theoretical concepts that underlie this study, such as
needed to implement in e-commerce services. The traditional clustering techniques, are briefly described here.
method of customer segmentation is relatively simple and
A. Clustering Algorithm
rough and can’t well cater to the market’s business model
The task of grouping a collection of objects into groups
Identify applicable funding agency here. If none, delete this. that contain only objects of the same type is known as
cluster analysis or clustering. In a lot of study, cluster anal- V. E XPERIMENTAL R ESULT A NALYSIS
ysis methodologies have been covered in detail (see [1], A. Setup
[4],[5],[8],[16]). Three algorithms—k-means, DBSCAN, and
This experiment is carried out on an Intel Core i5, Jupyter
hierarchical clustering—are the main focus of our study.
Notebook, Python 3, and an x64-based processor. Pandas
1) K-means: The data are divided into the specified number version: 1.4.3, numpy version: 1.20.3 and seaborn version:
of clusters using the k-means algorithm. Clusters are first 0.12.0 were installed.
chosen at random. The closest cluster is used to reassign ob-
servations throughout each iteration. Recalculated new cluster B. dataset
centers are used. Up until all of the observations are located Data was compiled into an excel file. Some customer data
in the nearest cluster, the process is repeated. were collected from the e-commerce website and some data
2) DBSCAN Algorithm : This common understanding were updated, including details from customer registration,
of ”clusters” and ”noise” is the foundation of the DB- login, and ratings. There were 400 rows in it. There are 7
SCAN(Density Based Spatial Clustering of Applications with attributes in the dataset, which are listed below in tableI:
Noise) algorithm. The main principle is that at least a certain
number of points must be present in the vicinity of each point TABLE I
within a cluster within a particular radius.Instead of estimating F EATURES OF DATA
the number of clusters in the case of DBSCAN, the two No Feature Description
hyperparameters epsilon and MinPts will be defined. 1 CustomerId Unique Customer Id
2 Age Age of Customer
• Epsilon (ϵ): A unit of measurement that will be used to 3 Annual Income(k$) Annual income of customer
locate points and determine the density around any given 4 Gender Gender of customer(Male or female)
point. 5 ratings Ratings of customer on basis of service
6 Age Age of Customer
• MinPts(n): The least amount of points necessary to form 7 Loaction The loacation of customer
a clustere.
MinPts(n) ≥ D + 1
C. Cluster Tendency Test
Where, D= Dimension count for the dataset.
The main purpose of the cluster tendency test is to determine
3) Hierarchical Clustering : Another unsupervised ma-
whether clustering is applicable.Therefore, Cluster tendency
chine learning approach, hierarchical clustering (also known
test was observed. The Hopkins Test is a well-known test for
as hierarchical cluster analysis, or HCA), is used to cluster
cluster tendency. It determines whether or not observations are
unlabeled datasets. There are two varieties of the hierarchical
spread randomly around the area.
clustering. They are: Agglomerative Hierarchical Clustering
Hopkins Score: 0.09322634776929459
and Divisive Hierarchical Clustering
A dataset is acceptable for clustering when it has a Hopkins
score of less than 0.5. A dataset that has a score of more than
IV. M ETHODOLOGY 0.5 is also unsuitable for clustering. It is clear that the dataset
was appropriate for clustering because the score was below
The required libraries are imported. Then the data is im-
the cutoff.
ported. In this project, clustering algorithms include the K-
means algorithm, hierarchical clustering, DBSCAN are build D. Building The Clustering Model
and then the models are compared. After comparing them- The number of clusters is first required by many clustering
selves, the best model is selected and the k-means clustering methods like K-Means or Hierarchical, however determining
algorithm has been applied in customer segmentation. In Fig. the optimal number for clustering can be challenging. For each
1, integrated customer segmentation framework is shown. . cluster analysis, the optimal number of clusters can be found
in a variety of ways. The Elbow Method was employed in this
instance.
• Distortion score:
In Fig. 2 demonstrates how a distortion score decreases as
the number of clusters rises. However, it is impossible to see
an obvious ”elbow.” Four clusters were recommended by the
underlying algorithm.
• Silhouette score:
Plotting the silhouette score as a function of cluster size
is another method for determining the ideal cluster size.
The silhouette score measures how effectively samples are
Fig. 1. Integrated Customer Segmentation Framework clustered with other samples that are similar to them in order to
assess the quality of clusters produced by clustering algorithms
Fig. 4. Determine optimal number of clusters

Fig. 2. Distortion score for clustering


clustering model and Hierarchical clustering model were de-
veloped after figuring out the optimal number of clusters.
like K-Means. Each sample’s silhouette score, S, is computed
using the following formula: E. Comparison Between K means and Hierarchical Clustering
b−a
S= (1)
max(a, b)
where,
a= Average intra-cluster distance, or how far apart each
point in a cluster is on average. Fig. 5. Segmentation using Age and Ratings
b= The mean distance between all clusters.
Clusters=4 gets the highest score after analyzing the silhou- The ratio of the sum of the dispersion between and within
ette scores. each cluster is known as the Calinski-Harabasz index or score,
often referred to as the Variance Ratio Criterion. The better
the performance, the higher the score.
Given that the observations within each cluster are more
closely spaced apart (more dense), a high CH results in better
clustering (well separated).

Fig. 6. Segmentation using Annual Income, Age and Ratings

From the above Fig. 5 and 6 , it is found that when


comparing the Calinski Harabasz score and the Silhouette
score, K-Means outperformed Hierarchical clustering.

F. Comparison between k-means and DBSCAN


Fig. 3. silhouette scores
TABLE II
Wcss: The total distance between the centroids. Every data C OMPARISON BETWEEN K - MEANS AND DBSCAN
point is close to its nearest centroids, or the model has Cluster K means Size DBSCAN
produced strong outcomes, according to the modest wcss. 0 110 400
1 163 NaN
2 30 NaN
X
W css = (Xi − Yi )2 (2) 3 97 NaN
iϵn

Where, Yi is the centroid for observation Xi . In Table. II , it is clear that DBSCAN was unable to produce
In Fig. 4,it is observed that according to the elbow model, rational clusters. DBSCAN will provide less-than-ideal results
segmenting customers into four categories would be optimal. if one of our clusters is less dense than the others because it
The optimal number of clusters was therefore 4. The Kmeans won’t classify the least dense group as a cluster.
VI. I NSIGHT AND R ECOMMENDATION
According to the analysis’s findings, there are four groups
or segments that can be used to promote to specific customer
groups.

Fig. 9. Comparison between k-means and DBSCAN

• cluster 1 (Potential Loyal Customer): The average age


is around 34.45, the annual income is high but the
ratings are on average. This group consists of potential
loyal customers. They can be loyal customer and specific
Fig. 7. Customer in each cluster strategies should be developed to drive this group to
become best customer.
• cluster 2 (Lost customer or churned) :The average age
is around 60.33. The annual income is high and ratings
are low.Launching a marketing campaign is necessary
to stimulate interest in the online store.The best way to
increase ratings should be determined through research.
• cluster 3 (New customer): The average age is around
26.52. Both annual income and ratings are on average.
The ecommerce marketplace should interact with them
more often in order to turn them into Regular customers.
A. Conclusion
In the paper, the possibilities of applying three algorithms
of cluster analysis: k-means, Hierarchical clustering and DB-
SCAN are considered in e-commerece customer segmentation
for further step.
Fig. 8. Male and Feamale in each cluster Hopefully, to get more satisfactory customer segmentation
in future, while using Neural Network for cluster analysis,
In Fig.7 and Fig.8 it is observed that: other types of clustering algorithms will such as BIRCH also
• cluster 0 : In this cluster The number of customers is be applied on different datasets and their performances will
around 110. The number of male and female is around be evaluated as well.
59 and 51.
R EFERENCES
• cluster 1:In this cluster The number of customers is
around 163. The number of male and female is around [1] R.-J. Kuo, C. Mei, F. E. Zulvia, and C. Tsai, “An application of a meta-
heuristic algorithm-based clustering ensemble method to app customer
87 and 76. segmentation,” Neurocomputing, vol. 205, pp. 116–129, 2016.
• cluster 2 :In this cluster The number of customers is [2] F. Al-Qaed and A. Sutcliffe, “Adaptive decision support system (adss)
around 30. the number of male and female is around 18 for b2c e-commerce,” in Proceedings of the 8th international conference
on Electronic commerce: The new e-commerce: innovations for conquer-
and 12. ing current barriers, obstacles and limitations to conducting successful
• cluster 3 : In this cluster The number of customers is business on the internet, pp. 492–503, 2006.
around 97. The number of male and female is around 52 [3] Z. Yao, P. Sarlin, T. Eklund, and B. Back, “Combining visual customer
segmentation and response modeling,” Neural Computing and Applica-
and 45. tions, vol. 25, no. 1, pp. 123–134, 2014.
• cluster 0 (Best customer) : The average age is around [4] Z. Li, “Research on customer segmentation in retailing based on cluster-
ing model,” in 2011 International Conference on Computer Science and
44.25, both annual income is low but ratings are high. Service System (CSSS), pp. 3437–3440, IEEE, 2011.
They gave the highest ratings. So, this group consists of [5] L. Zahrotun, “Implementation of data mining technique for customer
Best customers. Ecommerce sites drive the main revenue relationship management (crm) on online shop tokodiapers. com with
fuzzy c-means clustering,” in 2017 2nd International conferences on
from this group. Very special promotions can be ensured Information Technology, Information Systems and Electrical Engineering
in order not to miss it. (ICITISEE), pp. 299–303, IEEE, 2017.
[6] J. Wu and Z. Lin, “Research on customer segmentation model by clus-
tering,” in Proceedings of the 7th international conference on Electronic
commerce, pp. 316–318, 2005.

You might also like