Research On Segmenting E-Commerce Customer Through An Improved K-Medoids Clustering Algorithm
Research On Segmenting E-Commerce Customer Through An Improved K-Medoids Clustering Algorithm
Research Article
Research on Segmenting E-Commerce Customer through
an Improved K-Medoids Clustering Algorithm
Zengyuan Wu ,1 Lingmin Jin ,1 Jiali Zhao ,1 Lizheng Jing ,1 and Liang Chen 2
1
College of Economics and Management, China Jiliang University, No. 258, Xueyuan Street, Hangzhou, Zhejiang 310018, China
2
College of Optical and Electronic Technology, China Jiliang University, No. 258, Xueyuan Street, Hangzhou,
Zhejiang 310018, China
Received 2 March 2022; Revised 11 April 2022; Accepted 11 May 2022; Published 18 June 2022
Copyright © 2022 Zengyuan Wu et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
In view of the shortcomings of traditional clustering algorithms in feature selection and clustering effect, an improved Recency,
Frequency, and Money (RFM) model is introduced, and an improved K-medoids algorithm is proposed. Above model and
algorithm are employed to segment customers of e-commerce. First, traditional RFM model is improved by adding two features of
customer consumption behavior. Second, in order to overcome the defect of setting K value artificially in traditional K-medoids
algorithm, the Calinski–Harabasz (CH) index is introduced to determine the optimal number of clustering. Meanwhile,
K-medoids algorithm is optimized by changing the selection of centroids to avoid the influence of noise and isolated points.
Finally, empirical research is done using a dataset from an e-commerce platform. The results show that our improved K-medoids
algorithm can improve the efficiency and accuracy of e-commerce customer segmentation.
features, we introduce customer consumption behavior data purchasing power, and added these features to the RFM
into the traditional RFM model, including data added to model so that consumer categories could be accurately
shopping cart (C) and favorites (V). Second, in terms of identified and differentiated.
algorithm improvement, we address the problem of artifi- K-means algorithm and K-medoids algorithm are the
cially setting K values in the K-medoids algorithm and most commonly used clustering algorithms. K-means has
introduce the CH as clustering quality evaluation index to been widely applied in the fields of data mining and pattern
determine the best K values. Meanwhile, according to the recognition because of its advantages such as simple oper-
problem that the K-medoids algorithm is sensitive to the ation and fast speed. However, the traditional K-means
initial clustering center, we combine the K-means++ algo- algorithm is susceptible to noise and isolated points, which
rithm to improve the selection of clustering center. The leads to poor clustering results [26]. K-medoids algorithm is
experimental results show that the improved K-medoids another classical division-based clustering method [27].
algorithm can effectively alleviate the sensitivity of the al- Compared with K-means, this algorithm optimizes the se-
gorithm to noise and initial clustering center selection. The lection method of the center of mass, overcomes the defect of
algorithm also considers the operational performance of the being sensitive to isolated points, and has higher clustering
algorithm, so as to improve the efficiency and accuracy of accuracy. However, the K-medoids algorithm still has the
e-commerce customer segmentation. problem of being vulnerable to the initial clustering center.
The rest of this paper is organized as follows. In Section To address the above problem, many scholars have proposed
2, the existing literatures on customer segmentation are a series of improved algorithms for K-medoids.
reviewed and the research gaps are proposed. In Section 3, According to the problem of the selection of initial
the improved K-medoids algorithm is described in detail. In clustering centers, two improvement ideas are mainly
Section 4, empirical research is done using an e-commerce proposed in existing literature. First, based on the
dataset and the empirical results are analyzed. In Section 5, K-medoids algorithm, existing literatures optimize the se-
the contributions, shortcomings, and future research are lection of initial clustering centers using the distance or
discussed. Finally, the conclusions are drawn in Section 6. correlation between samples [28, 29]. This improved method
is based on the following principle. Since the cluster centers
2. Literature Review are usually the more important sample points in a cluster,
the denser the sample points are with strong correlation with
Existing literature on customer segmentation is divided into other sample points, the easier they are to become the best
two fields. The first is about selecting different segmentation cluster centers. Ho-Kieu et al. [28] proposed an improved
features. The second is about selecting and improving the initial center selection method by introducing probability
clustering algorithms. In terms of the selection of seg- density function. The experimental results showed that the
mentation features, the existing literature can be divided into improved algorithm had obvious advantages compared with
three types from different perspectives [12], including de- the original K-medoids algorithm. The above improved
mographic perspective, customer life cycle perspective, and methods optimize K-medoids for the selection of initial
customer behavior perspective. Firstly, scholars [13] who clustering centers, reduce the number of iterations, and
conducted research from the perspective of demography improve the clustering efficiency. However, these selection
mainly collected data using questionnaire surveys. They methods only consider the distance or correlation between
divide customers into different groups according to their samples, which is easy to make the clustering results fall into
age, gender, family income, marital status, education, etc. local optimum. They cannot achieve more accurate clus-
Secondly, literature studying this issue from the perspective tering results for datasets with large disparity in the number
of the customer life cycle [14] divides the customer life cycle of samples between clusters.
into several stages according to the number of new cus- Second, some scholars introduce the Swarm Intelligence
tomers, retained customers, and lost customers. In different [30, 31] and combine it with K-medoids to improve the
stages, companies should take different measures for them. global search capability and efficiency of the improved al-
The customer loyalty classification method [15, 16] is the gorithms for samples. Arthur and Vassilvitskii [32] algo-
most popular segmentation method in existing segmenta- rithmically fused the Swarm Algorithm with K-medoids. The
tion literature. Third, with the continuous development of experimental results showed that the improved algorithm
data mining technology, the indicator selection methods effectively reduced the influence of noise on the clustering
based on customer behavior are becoming a hot topic. In results and improved the clustering accuracy. This type of
these literatures, multidimensional features are used to re- improved algorithm effectively avoids the problem of local
flect the consumption behaviors and habits of different optimum of clustering results. However, it is worth noting
customer groups [17, 18]. As a classic customer value model, that the integration with the Swarm Intelligence will lead to
the RFM model has been successfully applied to customer the increase in algorithm complexity and the reduction in
segmentation [19, 20]. Due to features in different industries, operation efficiency. The huge transaction volume and mass
some scholars have improved and extended the RFM model data in e-commerce platforms require high clustering effi-
[21–24]. However, the consumer behavior preference among ciency. It is necessary for platform managers to segment
different customer groups cannot be well identified. Yoseph customer timely in order to manage e-commerce customers
et al. [25] studied consumer behavior (e.g., clicking on well. Therefore, we try to solve the problem of sensitivity to
product links, browsing products, and adding to cart) and the initial clustering center that exist in K-medoids
Computational Intelligence and Neuroscience 3
algorithm while ensuring the operational efficiency of the iterations is reached. Then, the cycle ends and the final
algorithm in this paper. clustering result is obtained.
In summary, in existing e-commerce customer seg-
mentation literature, there are still two gaps that have not 3.2. Implementation Procedure of the Improved K-Medoids
been solved well. First, from the perspective of selecting Algorithm.
segmentation features, the existing literatures focus on using
the historical order data of customers. But the consumption
behavior data of customers is ignored, which cannot more 3.3. Determine the Optimal Number of Clusters k. We in-
comprehensively reflect the behavioral preferences and troduce the CH clustering quality evaluation index [32] and
consumption habits of customers in different customer set the class corresponding to the highest CH value as the
groups. Second, from the perspective of clustering algo- number of clusters. The CH value is the ratio of intercluster
rithms, although the improved K-medoids algorithm in sample separation to intracluster sample tightness, and a
existing literature alleviates the sensitivity of the algorithm to larger CH represents a tighter class itself and a more dis-
the initial clustering center and improves the clustering persed class to class (i.e., a better clustering result). When the
performance, there are still limitations in the two aspects. intracluster is dense and the intercluster separation is good,
First, the clustering results may fall into the local optimum. the optimal number of clusters can be clearly derived from
Second, the algorithm may run less efficiently. the CH value line graph, and it has the advantage of fast
Therefore, we attempt to solve the above problems. First, calculation speed.
while selecting segmentation features, we construct a new The calculation formula of CH value is as follows.
model by incorporating customers’ online consumption
behavior, where Recency, Frequency, Money, Add to Cart, BGSS m−k
S(k) � × . (1)
and Add Favorites are included. For clarity, this model is WGSS k−1
called a RFMCV model. Second, considering the defect of
artificially set K values in the K-medoids algorithm, we Within-Groups Sum of Squared Error (WGSS) is the
introduce the CH index to determine the best K values. sum of squared errors within clusters. It is used to measure
Third, drawing on the idea of K-means++ algorithm [33] for the tightness of samples within clusters. The smaller the
selecting initial clustering center, the K-medoids algorithm WGSS is, the tighter the clusters are and the better the
is improved. Finally, the algorithm proposed in this paper is clustering effect is. Its calculation formula is
validated on two standard test datasets.
1
WGSS � m1 − 1d21 + · · · + mk − 1d2k , (2)
2
3. Improved K-Medoids Algorithm
where d21 is the average distance of samples within the k-th
In this paper, we improve K-medoids algorithm from two cluster; mk is the number of samples in the k-th cluster.
aspects. First, the CH evaluation index is introduced in order Between-Groups Sum of Squared Error (BGSS) is the
to determine the optimal number of clusters in the sum of squared errors between clusters, which is used to
K-medoids algorithm. Second, the idea of K-means++ al- measure the separation of samples between clusters. The
gorithm is introduced while selecting initial clustering larger the BGSS is, the more dispersed the clusters are and
centers. the better the clustering effect is. Its calculation formula is
k
1
BGSS � ⎢⎣⎡(k − 1)d2 + · · · + mj − 1d2 − d2j ⎤⎥⎦, (3)
3.1. Description of the K-Medoids Algorithm. Both K-means 2 j�1
and K-medoids algorithms are classical division-based
clustering methods, which generally use Euclidean distance where d2 is the average distance between all samples, d2j is
as a measure of similarity between two data points. The the average distance of samples within the j-th cluster, mj is
smaller the distance, the greater the similarity. However, the the number of samples in the j-th cluster, and k is the
K-medoids algorithm is optimized for the selection of number of sample clusters.
centroids to avoid the influence of noise and isolated points
[34]. The algorithm is implemented in the following steps.
3.4. Comparison and Validation. In order to verify the ef-
First, input dataset and the number of clusters. Second,
fectiveness of the improved K-medoids proposed in this
initialize the clustering centers and assign samples. Ran-
paper, two comparison experiments are conducted. First, we
domly select the initial clustering centers, calculate the
compare the performance of clustering algorithms. Second,
Euclidean distance between the remaining data points and
we compare the clustering quality evaluation indicators.
the clustering center, find the shortest distance, and assign all
samples to the clusters corresponding to the clustering
center. Third, update the cluster centroids. Randomly select 3.4.1. Comparison of Algorithm Performance. In order to
a noncentroid, and replace the clustering centers according verify the effectiveness of the algorithm, two standard test
to the principle of squared error function value reduction. datasets were selected for the experiments, including breast
Finally, iterative calculation is performed until the clustering cancer [35] and iris plants [36] in UCI database. UCI da-
center no longer changes or the maximum number of tabase is the most popular dataset in the field of machine
4 Computational Intelligence and Neuroscience
Input: dataset Y � {y1, y2, . . ., yn}, X � x1 , x2 , . . . , xn , where n is the number of data points.
Step 1: Randomly select one sample from the dataset as the initial clustering center C1.
Step 2: First, calculate the shortest distance D(x) between each sample and the existing clustering center. Second, calculate the
probability P(x) that a sample is selected as the next clustering center. Calculate P(x), which yields to P(x) � D(x)2 /x∈X D(x)2 .
Third, a random number Ri is generated in the interval (0, 1), and calculate the difference between P(x) and Ri Finally, when the
difference is less than or equal to 0 for the first time, the corresponding object is the next clustering center.
Step 3: Repeat Step 2 until K clustering centers are selected.
Step 4: Assign samples. Calculate the Euclidean distance between the remaining data points and the cluster center Ci, then find the
shortest distance. Assign all samples to the clusters corresponding to the cluster center Ci.
Step 5: Update the cluster centers. Randomly select the non-central point Crandom and replace Ci with Crandom to update the cluster
centroids of each cluster according to the principle of squared difference function value reduction.
Step 6: Repeat Step 4 and Step 5 until the cluster centers no longer change or the maximum number of iterations is reached, the cycle
ends and the final clustering result is obtained.
Output: Clustering result C � {c1, c2, . . ., ck}.
learning, which is built by University of California Irvine. Table 1: The performance of 4 algorithms working on different
Furthermore, K-medoids, K-means++, and spectral clus- datasets.
tering (SC) method were selected to compare with the Datasets
improved K-medoids algorithm proposed in this paper.
Breast cancer Iris plants
Both the clustering accuracy and the running time of 4
Clustering algorithm ACC Time
algorithms on the two datasets were mainly compared. The Time (ms) ACC (%)
(%) (ms)
results are shown in Table 1.
As can be seen from Table 1, the improved K-medoids K-medoids 0.858 33.1 0.663 26.5
K-means++ 0.854 208.2 0.833 265.0
algorithm has an accuracy of 86.8% on the breast cancer
Spectral clustering 0.667 103.8 0.9 118.1
dataset, outperforming the K-medoids, K-means++, and Improved K-medoids 0.868 22.7 0.840 13.9
spectral clustering methods in terms of clustering accuracy.
Meanwhile, the running time of the improved K-medoids
algorithm is shorter than the other 3 algorithms, which is line graph, because continuing to increase the K value after
22.7 ms. On the iris plants dataset, the improved K-medoids the inflection point does not increase the classification ac-
algorithm has the highest accuracy of 84% and the shortest curacy much, but increases the number of clusters. In
running time of 13.9 ms. Therefore, among the four algo- Figure 2, the horizontal axis is the number of clusters, and
rithms, the improved K-medoids algorithm has the best the vertical axis is the sum of squares due to error (SSE). As
performance in terms of accuracy and clustering efficiency. can be seen in Figure 2, when the K value changes from 4 to
Based on the above analysis, the improved K-medoids al- 19, the change in the folding graph is smoother (i.e., there is
gorithm proposed in this paper outperforms the other three no obvious inflection point to accurately determine the
clustering methods on both datasets. optimal number of clusters).
The above analysis shows that the CH index is better than
the inflection point method in the segmentation of
3.4.2. Comparison of Clustering Quality Evaluation e-commerce customers.
Indicators. In order to determine the best K value, the CH
index is introduced to decide the K value in this paper. In 4. Empirical Analysis
order to verify the applicability of the CH index for customer
segmentation in the e-commerce industry, we use the 4.1. Selecting Features for Customer Segmentation. RFM
e-commerce dataset in practice. Furthermore, the result is model was first proposed by Hughes [10], which is generally
compared with the inflection point method. The experi- an analysis tool used to identify an organization’s best
mental result of CH value is shown in Figure 1. The ex- customers. RFM model is based on 3 factors, including
perimental result of the inflection point method is shown in Recency (R), Frequency (F), and Monetary value (M). Re-
Figure 2. cency (R) usually represents how recently a customer has
As can be seen from Figure 1, the line chart of CH value made a purchase. The more recently a customer has made,
shows a line rising and then falling trend, and the highest CH the more likely he will continue to keep the relationship.
value is obtained when the number of clusters is 4. Therefore, Frequency (F) usually represents how often a customer
using the CH index, it can be clearly concluded that the makes a purchase within the observation period. The larger
optimal number of clusters for this e-commerce platform the F-value represents the idea that the more frequent the
dataset is 4. customer consumption, the higher the customer value.
The principle of the inflection point method is to obtain Monetary (M) usually represents how much money a cus-
the optimal number of clusters at the inflection point of the tomer spends on purchases within the observation period.
Computational Intelligence and Neuroscience 5
12000
11000
10000
9000
CH
8000
7000
6000
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Number of clusters
Figure 1: Line chart of CH value.
60000
55000
50000
SSE
45000
40000
35000
30000
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Number of clusters
Figure 2: Line chart of inflection point method.
The larger the M-value, the higher the customer value. Since customers’ s activity and online consumption habits. Add to
its introduction, the RFM model has been widely used in cart (C) represents frequency that a consumer has added a
customer segmentation [29]. product to their shopping cart. Add favorites (V) represents
The traditional RFM model has been widely used for the frequency that a consumer has added a product to their
customer segmentation in various industries. However, product favorites. Both of these behaviors represent the
there are still several problems. The RFM model cannot consumer’s preference for a product. The higher the fre-
reflect the customer’s activity on the e-commerce platform quency is, the more likely consumers are to buy the product.
and the differences in consumption and behavior between The introduction of these two indicators into the RFM
different customer groups. With the development of big data model can effectively improve the effectiveness of the RFM
technology, the dimensions of customer data extracted from model for e-commerce customer segmentation [25].
e-commerce platforms are increasing, and these data reflect
customers’ value characteristics, consumption habits, and
behavioral preferences in a more detailed and compre- 4.2. Data Description. The customer consumption data in
hensive way. Therefore, based on the traditional RFM model, this paper is from Kaggle database [37]. There are 100,000
we integrated customers’ online behavioral indicators and orders from multiple marketplaces in Brazil from 2016 to
proposed the RFMCV model for e-commerce customer 2018. Many features are contained in this dataset, including
segmentation, in which C and V indicators could reflect order status, price, payment, and freight performance to
6 Computational Intelligence and Neuroscience
customer location, product attributes, and reviews written Z-score normalization method is employed in this paper,
by customers. Then the order and online behavior data of which normalizes the data by giving the mean and standard
37,376 customers were extracted from this dataset. The deviation of the original data. The processed data yields the
consumption time is from November 18, 2017, to December standard normal distribution (i.e., the mean value is 0 and
18, 2017. In order to segment e-commerce customers, we the standard deviation is 1). The transformation function is
select 5 fields. The fields and descriptions in the dataset X−μ
involved in this dataset are shown in Table 2. X∗ � , (4)
σ
where μ is the mean of all samples and σ is the standard
4.3. Data Preprocessing
deviation of all samples.
4.3.1. Data Cleaning. The behavioral data of these e-com- After the normalization process, all data were converted
merce customers in a month is about 100,000 pieces, and to dimensionless data. Partial data is shown in Table 4.
data cleaning is needed. Firstly, data with missing and ab-
normal values are processed, such as data with zero expense,
data with purchase date as the idle value, and data with 4.4. Analysis of Empirical Results. According to the experi-
obviously wrong expense. Secondly, duplicate data are mental results in Section 3.2, the optimal number of clusters
processed. The user’s purchase behavior is accurate to the k is 4. Based on the RFMCV model, the improved
hour. There will be a small number of users who repeatedly K-medoids algorithm is run. The results show that all
purchase or add favorites within an hour, so this kind of data customers are divided into 4 groups, named Type A, Type B,
will be processed. Finally, the consistency of the data is dealt Type C, and Type D. The distribution of each indicator of the
with. The indicator R involves time features. The date and RFMCV model of four customer types is shown in Figure 3.
hour in the time data exist in one field, so it is split into two Comparing the customer indicators of each group
fields. In addition, we convert the field type in the Time- among the 4 groups in Figure 3, some findings can be drawn.
stamp field into the form of year, month, and day to facilitate The value of Type B customers is the highest, which
the calculation of time. includes 13,415 customers, accounting for 35.89% of total
e-commerce customers. R-value of the Type B customers is
smaller; their last purchase on this platform is more recent.
4.3.2. Indicator Extraction and Normalization. The indi- The F-value is the highest, suggesting that the frequency is
vidual indicators in the RFMCV model are explained in high and that they are active customers on this e-commerce
detail as follows: platform. M-value is the biggest; they spend the most in this
platform. C-value is the biggest; they add to cart most
R: recency: the time interval between the customer’s last
frequently. However, V-value is small, which shows that
purchase in the observation period and 31 December
these customers often add to cart rather than add favorites
2017.
when they find interesting products. This group has the
F: frequency of customer purchasing in the observation highest current value and value-added potential and should
period. be classified as a high-value customer group in this e-com-
M: monetary: the amount spent by the customer in the merce platform. For this group, platform owners should put
observation period. significant effort and resources into maintaining and devel-
C: frequency of the customer who added the product to oping good relationships with them. Effective measures
cart in the observation period. should be taken to tap their consumption potential.
The second valuable customer group is type A, which
V: frequency of the customer who added the product to
includes 7,463 customers, accounting for 19.97% of total
favorites in the observation period.
customers. R-value of the Type A customers is smaller than
According to the RFMCV model proposed in this paper, Type B and Type D, and they make a purchase most recently.
37,376 samples are collected, and some of them are shown in Both F-value and M-value of Type A are the second biggest
Table 3. among the 4 groups. They are more active customers and
In order to avoid the disparity caused by the different spend more on this e-commerce platform. Different from
units of each indicator, the dataset after indicator extraction Type B, C-value of these customers is low, but the V-value is
needs to be normalized prior to experimental analysis. The the highest among these four groups. It shows that these
Computational Intelligence and Neuroscience 7
Table 4: The table of partial data of RFMCV model after normalized treatment.
Customer_unique_id R F M C V
5 −0.000902 2.068466 −0.097700 −0.745080 −0.397498
18 −0.623415 −0.018191 1.465430 1.340590 −0.390554
22 1.247733 3.807347 0.597041 −0.390554 −0.390554
... ... ... ... ... ...
906311 −0.625219 −0.365967 −1.139736 −0.397498 −0.390554
906338 0.935574 1.025137 0.324119 0.167400 −0.390554
906355 0.311257 −0.677361 −0.097700 0.428109 −0.390554
4.5
3.5
cluster center value
2.5
1.5
0.5
-0.5
-1.5
Type A customers Type B customers Type C customers Type D customers
R C
F V
M
Figure 3: Distribution chart of four groups.
customers are used to adding favorites when they find in- However, the number of this group is big, and their
teresting products. According to the above analysis, cus- consumption frequency is medium. It is necessary for
tomers of Type A can be classified as the second valuable platform owners to enhance the value of this group by
group. These customers have greater potential for value personalized push products.
mining. The platform owners should hold some promotional The fourth customer group is Type C, including 2,158
activities in order to stimulate their consumption potential. customers, accounting for 5.77% of total e-commerce cus-
The third customer group is Type D, which includes tomers. R-value of this customer group is low, and F-value is
14,340 customers, accounting for 38.37% of total e-com- smallest, indicating that this group has recently spent money
merce customers. These customers have the biggest R-value, on the platform, but the overall consumption frequency is
indicating that they have not purchased goods from this low. M-value, C-value, and V-value are smallest; they are
platform for a long time. F-value, M-value, C-value, and V- also inactive customers. Unlike those customers of Type D,
value are all small, indicating that this group of customers they complete their last purchase at a very close time, so they
is inactive in this e-commerce platform. They do not are likely to be new customers. Special attention needs to be
frequently add favorites or add to cart on the platform. paid to them. It is important to understand their needs and
They can be classified as a low-value customer group. develop good relationship with them.
8 Computational Intelligence and Neuroscience
[3] W.-Y. Chiang, “Establishing high value markets for data- Intelligence and Neuroscience, vol. 2022, Article ID 1499801,
driven customer relationship management systems,” Kyber- 11 pages, 2022.
netes, vol. 48, no. 3, pp. 650–662, 2019. [19] P. A. Sarvari, A. Ustundag, and H. Takci, “Performance
[4] E. Umuhoza, D. Ntirushwamaboko, J. Awuah, and B. Birir, evaluation of different customer segmentation approaches
“Using unsupervised machine learning techniques for be- based on RFM and demographics analysis,” Kybernetes,
havioral-based credit card users segmentation in africa,” vol. 45, no. 7, pp. 1129–1157, 2016.
SAIEE Africa Research Journal, vol. 111, no. 3, pp. 95–101, [20] M. Song, X. Zhao, H. E, and Z. Ou, “Statistics-based CRM
2020. approach via time series segmenting RFM on large scale data,”
[5] Y. Deng and Q. Gao, “A study on E-commerce customer Knowledge-Based Systems, vol. 132, pp. 21–29, 2017.
segmentation management based on improved K-means al- [21] W.-Y. Chiang, “To mine association rules of customer values
gorithm,” Information Systems and e-Business Management, via A data mining procedure with improved model: an em-
vol. 18, no. 4, pp. 497–510, 2018. pirical case study,” Expert Systems with Applications, vol. 38,
[6] H. Güçdemir and H. Selim, “Corrigendum to “Integrating no. 3, pp. 1716–1722, 2011.
simulation modelling and multi criteria decision making for [22] B. Zhao, W. Li, Q. Guo, and R. Song, “E-commerce picture
customer focused scheduling in job shops” [Simulation text recognition information system based on deep learning,”
Modelling Practice and Theory 88 (2018) 17-31],” Simulation Computational Intelligence and Neuroscience, vol. 2022, Ar-
Modelling Practice and Theory, vol. 100, Article ID 101990, ticle ID 9474245, 11 pages, 2022.
2020. [23] H. Li, X. Yang, Y. Xia, L. Zheng, G. Yang, and P. Lv, “K-
[7] G. Sun, X. F. Xie, J. Y. B. Zeng et al., “Using improved RFM LRFMD: method of customer value segmentation in shared
model to classify consumer in big data environment,” In- transportation filed based on improved K-means algorithm,”
ternational Journal of Embedded Systems, vol. 14, no. 1, Journal of Physics: Conference Series, vol. 1060, no. 1, Article
pp. 54–64, 2020. ID 012012, 2018.
[8] Q. S. Wang, X. Yang, P. J. Song, and C. L. Sia, “Consumer [24] Z. Wu, C. Zhou, F. Xu, and W. Lou, “A CS-AdaBoost-BP
segmentation analysis of multichannel and multistage con- model for product quality inspection,” Annals of Operations
sumption: a latent class mnl approach,” Journal of Electronic Research, vol. 308, no. 1-2, pp. 685–701, 2020.
Commerce Research, vol. 15, no. 4, pp. 339–358, 2014. [25] F. Yoseph, N. H. Ahamed Hassain Malim, M. Heikkilä,
[9] R. Punhani, V. P. S. Arora, A. Sai Sabitha, and V. K. Shukla, A. Brezulianu, O. Geman, and N. A. Paskhal Rostam, “The
“Segmenting E-commerce customer through data mining impact of big data market segmentation using data mining
techniques,” Journal of Physics: Conference Series, vol. 1714, and clustering techniques,” Journal of Intelligent & Fuzzy
no. 1, Article ID 012026, 2021. Systems, vol. 38, no. 5, pp. 6159–6173, 2020.
[10] A. M. Hughes, Strategic database marketing, Probus Pub- [26] J. Deng, J. Guo, and Y. Wang, “A novel K-medoids clustering
lishing Company, New York, NY, USA, 1994. recommendation algorithm based on probability distribution
[11] C. Hennig and T. F. Liao, “How to find an appropriate for collaborative filtering,” Knowledge-Based Systems, vol. 175,
clustering for mixed-type variables with application to socio- no. 1, pp. 96–106, 2019.
economic stratification,” Journal of the Royal Statistical So- [27] H.-S. Park and C.-H. Jun, “A simple and fast algorithm for
ciety: Series C (Applied Statistics), vol. 62, no. 3, pp. 309–369, K-medoids clustering,” Expert Systems with Applications,
2013. vol. 36, no. 2, pp. 3336–3341, 2009.
[12] L. B. Romdhane, N. Fadhel, and B. Ayeb, “An efficient ap- [28] D. Ho-Kieu, T. Vo-Van, and T. Nguyen-Trang, “Clustering
proach for building customer profiles from business data,” for Probability Density Functions by New k-Medoids
Expert Systems with Applications, vol. 37, no. 2, pp. 1573–1585, Method,” Scientific Programming, vol. 2018, Article ID
2010. 2764016, 7 pages, 2018.
[13] P. B. Chou, E. Grossman, D. Gunopulos, and P. Kamesam, [29] R. Liu, H. Wang, and X. Yu, “Shared-nearest-neighbor-based
“Identifying prospective customers,” in Proceedings of the 6th clustering by fast search and find of density peaks,” Infor-
ACM SIGKDD international conference on Knowledge dis- mation Sciences, vol. 450, no. 1, pp. 200–226, 2018.
covery and data mining-KDD ’00, pp. 447–456, Boston MA, [30] G. Surya Narayana and D. Vasumathi, “An attributes simi-
USA, August 2000. larity-based K-medoids clustering technique in data mining,”
[14] W. Lan, “The impact of perception difference on channel Arabian Journal for Science and Engineering, vol. 43, no. 8,
conflict: a customer relationship life cycle view,” Journal of pp. 3979–3992, 2018.
Service Science and Management, vol. 8, no. 5, pp. 655–661, [31] Z. Pooranian, M. Shojafar, J. H. Abawajy, and A. Abraham,
2015. “An efficient meta-heuristic algorithm for grid computing,”
[15] W. Buckinx, G. Verstraeten, and D. Van den Poel, “Predicting Journal of Combinatorial Optimization, vol. 30, no. 3,
customer loyalty using the internal transactional database,” pp. 413–434, 2015.
Expert Systems with Applications, vol. 32, no. 1, pp. 125–134, [32] D. Arthur and S. Vassilvitskii, “K-means++: The advantages of
2007. careful seeding,” in Proceedings of the 18th annual acm-siam
[16] C. Martin, P. Adrian, and B. David, Relationship marketing, symposium on discrete algorithms, New Orleans, Louisiana,
Butter Worth-Heinemann Ltd, London, UK, 1998. USA, January 2007.
[17] S. Peker, A. Kocyigit, and P. E. Eren, “LRFMP model for [33] M. J. Brusco, D. Steinley, and J. Stevens, “K-medoids inverse
customer segmentation in the grocery retail industry: a case regression,” Communications in Statistics - Theory and
study,” Marketing Intelligence & Planning, vol. 35, no. 4, Methods, vol. 48, no. 20, pp. 4999–5011, 2019.
pp. 544–559, 2017. [34] T. Y. Kim, S. Kim, J. A. Kim et al., “Automatic identification of
[18] Q. Zhang, A. R. Abdullah, C. W. Chong, and M. H. Ali, “E- java method naming patterns using cascade K-medoids,” KSII
commerce information system management based on data Transactions on Internet and Information Systems, vol. 12,
mining and neural network algorithms,” Computational no. 2, pp. 873–891, 2018.
10 Computational Intelligence and Neuroscience