0% found this document useful (0 votes)
45 views26 pages

Sequential Clustering and Classication Approach To Analyze Sales Performance of Retail Stores Based On Point of Sale Data

Uploaded by

Sunny Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views26 pages

Sequential Clustering and Classication Approach To Analyze Sales Performance of Retail Stores Based On Point of Sale Data

Uploaded by

Sunny Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

International Journal of Information Technology & Decision Making

Vol. 21, No. 3 (2022) 885–910


°c World Scienti¯c Publishing Company
DOI: 10.1142/S0219622022500079

Sequential Clustering and Classi¯cation Approach


to Analyze Sales Performance of Retail Stores Based
on Point-of-Sale Data

Chao-Lung Yang*,‡ and Thi Phuong Quyen Nguyen†,§


*Department of Industrial Management
National Taiwan University of Science and Technology
No 43, Sec 4, Keelung Rd, Daan Dist., Taipei, Taiwan

Faculty of Project Management
The University of Danang- University of Science and Technology
54 Nguyen Luong Bang, Danang, Vietnam

[email protected]
§
[email protected]

Received 5 August 2020


Revised 16 December 2021
Accepted 16 December 2021
Published 10 February 2022

Point-of-Sale (POS) data analysis is usually used to explore sales performance in business
commence. This manuscript aims to combine unsupervised clustering and supervised classi¯ca-
tion methods in an integrated data analysis framework to analyze the real-world POS data.
Clustering method, which is performed on sales dataset, is used to cluster the stores into several
groups. The clustering results, data labels, are then combined with other information in store
features dataset as the inputs of the classi¯cation model which classi¯es the clustering labels by
using store features dataset. Non-dominated sorting generic algorithm-II (NSGA-II) is applied in
the framework to employ the multi-objective of clustering and classi¯cation. The experimental
case study shows clustering results can reveal the hidden structure of sales performance of retail
stores while classi¯cation can reveal the major factors that e®ect to the sales performance under
di®erent group of retail stores. The correlations between sales clusters and the store information
can be obtained sequentially under a series of data analysis with the proposed framework.

Keywords: Data Clustering; data classi¯cation; multi-objective optimization; non-dominated


sorting genetic algorithm; point of sale data.

1. Introduction
Point-of-sale (POS) data are commonly used for sales performance analysis, for
example, analyzing sales e®ects,1 improving customer service,2 and sale prediction.3,4
The transactional data contain statistical information that identi¯es a particular
customer who bought a set of products at certain prices, at a store or branch, and

‡ Corresponding author.

885
886 C.-L. Yang & T. P. Q. Nguyen

time.5,6 Fisher and Raman also mentioned that the proper analysis of sales data can
be extremely useful to a retailer in improving decision-making on optimizing store
assortments and dynamic pricing of items.7 Understanding the POS data not only
reveals the customers' purchase behaviors but also helps in enhancing the supply
chain management with more e®ective inventory control. For instance, POS data
analysis can discover how well an item was sold. Then, using this information, a
retailer can adjust the ordering level correspondingly as well as make pricing deci-
sions that adapt the trend and seasonality of the item at di®erent periods.8
Duchessi et al. grouped the existing techniques used for the retail POS data
analysis into three approaches9: (1) query-based method, (2) statistical techniques,
and (3) data-mining or machine learning method. Recently, Choi et al. also sum-
marized that statistics, machine learning, data mining, and optimization are the
main big data analytics techniques that have been utilized in operation management
areas.10 Essentially, the query-based methods by database query language (SQL) are
evaluated as easy-to-use techniques on the datasets with the prede¯ned columns/
features. However, these techniques were unable to detect hidden information
and knowledge. The statistical techniques such as multiple analysis of variance
(MANOVA), cluster analysis, regression analysis, and factor analysis can perform
multivariate patterns' analysis of association discovery. Although statistical tech-
niques provide scienti¯c interference, it is hard to interpret the results under multi-
dimensional datasets in a big data environment as well as to construct a statistical
approach when the assumption of data distribution might not be held. The algo-
rithms of data mining or machine learning techniques such as tree induction, neural
network induction, and association rule induction can provide promising analytics on
a relatively higher level of data complexity in large-scale data than the other
approaches. Conversely, the \algorithm-based" approach requires professional model
building and the coding of data into classes to handle the big data.
In general, a framework that can conduct data mining functions simply and
straightforwardly to analyze POS data is necessary for daily or weekly retailing
analysis. For example, a manager who is in charge of supervising multiple stores or
branches would like to investigate the sales performance of each store and also to
search out which factors in°uence the sales performance. To conduct the analysis,
usually, the performance of each store can be represented by the revenue received
within a speci¯ed period, such as a season, month, or week. To di®erentiate the store
performance, a grouping technique or clustering method is required to cluster the
stores by their multivariate time series revenue data. Based on the clustering result, a
manager would like to ¯nd out if there is a special sales pattern existing in the
clustering results. Besides, the manager might want to know further why those stores
have similar sales performances. Is this similarity based on store geometry, location,
customer demographics, or local competition? In other words, revealing the factors
related to the store similarity in terms of sale performance among stores can help in
managing the assortment of the item in similar stores (\clusters") rather than trying
to deal with listing items each store or each item individually. Factors regarding the
Sequential Clustering and Classi¯cation Approach 887

stores such as location, human resources, population around the store, available
transportation nearby the store, and so on are critical in in°uencing the sales per-
formance. Thus, utilizing classi¯cation analysis to ¯nd the most correlated factors
which a®ect the sales signi¯cantly is important.
Based on the analytics scenarios addressed beforehand, a systematic data analysis
framework is needed to handle the daily or weekly analysis tasks as a routine. In this
research, a data mining framework combining data clustering and classi¯cation tasks
using fast and elitist non-dominated sorting genetic algorithm (NSGAII-SCC)11 is
employed to analyze POS data. Clustering stores or sale items by the sale records
help in revealing the hidden information behind the selling pattern. Classi¯cation
analysis is applied to identify the factors which are likely correlated with the found
sales pattern in each cluster. By combining clustering and classi¯cation on data
exploration, an interesting research question was raised. When utilizing the clus-
tering method on the performance measures and then further using the classi¯cation
method to reveal the correlated factors, how to maintain the clustering and classi-
¯cation e®ectiveness (or qualities) to show the con¯dence of the POS data analysis?
The complexity of this combined analysis is depended upon the number of ways to
cluster retail stores and the multiple possibilities of the associated factors related to
the found store clusters (in terms of sale performance), for each clustering case. As
studied in the literature, the clustering problem is an NP-hard optimization prob-
lem,12 and searching the in°uential factor so-called the feature selection for classi-
¯cation is also a complicated technique, especially for a large dataset.13 When
searching space of ¯nding the correlation among the in°uential factors and retail
performance by POS data is large, how to conduct the data analysis under the need
of investigating retailing performance is an interesting research question.
To attack this research question, the NSGA-II-SCC method simultaneously opti-
mizes the solution quality of clustering and classi¯cation. The objective functions are
formed as (1) to minimize the clustering compactness when performing store clustering
based on the revenue performance and (2) to maximize the classi¯cation accuracy
when investigating factors that likely contribute to classifying the revenue perfor-
mance of each store cluster. For this combined data analysis, the number of clusters
(k), which is a user-de¯ned parameter, seems crucial and needs further investigation.
Too many clusters may lead to classi¯cation di±culty in recognizing the correlated
factors and loss of generality. On the other hand, too few clusters may make the
partitions too vague that the classi¯cation result may be too general and lose the
investigation capability. Thus, this study uses Technique for Order of Preference by
Similarity to Ideal Solution (TOPSIS)14 method to analyze the Pareto-optimal solu-
tions to determine the best number of clusters with its corresponding solution for the
framework. Besides, stepwise regression technique15 is employed to identify the sig-
ni¯cant features for classi¯cation which are highly correlated with the store patterns.
This paper is structured as follows. Section 2 reviews POS data analysis and
clustering and classi¯cation techniques. Section 3 describes the methodology of the
proposed framework. Section 4 shows the experimental result of applying the
888 C.-L. Yang & T. P. Q. Nguyen

proposed framework on the POS data of a bakery retail chain as a case study.
Section 5 presents the conclusion and future research direction.

2. Literature Review
2.1. POS data analysis
The POS system contains information on sold products such as the item number,
price, selling time, location, and customer information. Sales data are convenient to
be collected and managed using a POS system and have been applied widely and
e®ectively on di®erent commercial activities. The POS data will be stored and an-
alyzed so that the inventory and supply information can be provided rapidly to
decrease labor costs and improve productivity. For example, Croson and Donohue
studied the impact of sharing POS data on reducing the bullwhip e®ect in supply
chain management.16 Sharing POS information increased supply chain performance
by decreasing the magnitude of order oscillations for manufacturers, distributors,
wholesalers, and retailers. Karen Stein analyzed POS data for food service and built a
system to expedite customer service and support the managers to control their
business such as who are dining, what is selling as well as the inventory.17 Li et al.
analyzed ¯nancial data to de¯ne the groups of customers' behaviors and the po-
tential risks. In their study, a new cluster validation index and a novel penalty-
function-based solver were proposed to automatically detect customer groups as well
as re¯ne the clusters' centroids and hyperellipsoidal scopes.18
In the literature, using POS data to predict product sales is one of the popular
research works in POS data analysis. POS data can be processed in di®erent ways for
sales forecasting, which will bring great e®ects on marketing and sales strategies.3
POS data are formed in chronological order, as a result, time series models are
commonly used. Sundararaman et al. adopted the POS data and considered the
databases of trends, seasonality, and seasonal index using centered moving averages
and normalization.4 Holt-Winters model used the original trend and seasonal factor
to calculate the average value of POS data. Employing deleting and ¯ltering the
seasonal factor, fewer forecasting errors and better forecasting results can be pre-
sented. Williams et al. also aggregated order forecasting and past order information,
executing processes and other uncertain factors from POS data.19 Their model ¯l-
tered the in°uence and errors of the bullwhip e®ect and proposed a long-term balance
of inventory strategy.
Some researchers studied the e®ect of customers' behaviors based on POS data.
Duchessi et al. used POS data to group retail grocery chain customers based on their
similar purchasing behavior by geodemography.9 Kashima et al. developed a rec-
ommendation method that integrated a POS system and an automatic order func-
tion for a restaurant.20 The developed system recommended a menu to customers
based on the customers' pro¯le analyzed based on POS data. Ogawa et al. used POS
data to analyze customers' purchasing behaviors on vegetables and fruits.2 The sales
data revealed that point-of-purchase health information could be e®ective to foster
Sequential Clustering and Classi¯cation Approach 889

customers' healthy dietary habits. Aloysius et al. also investigated the service process
perceptions to shipping outcomes in the retail environment by using \mobile" POS
data carried by the smartphone.21 They found that the technology enablers on
mobile POS systems in scanning and payment scenarios give retailers competitive
advantage in the era of big data. All of the mentioned research utilized certain data
analysis methods or analysis framework to conduct the study under POS data.
Although POS data have been used in a variety of research domains, a few
research, based on our knowledge, focuses on developing the particular data mining
technique for analyzing POS data to deal with the analysis of retailing performance
and the corresponding factors. Therefore, this study aims to develop a new data
analysis framework that combines data clustering and data classi¯cation to analyze
POS data. The proposed technique is expected to apply to general POS datasets. In
this work, a case study of a bakery retail store chain was demonstrated as an example
of the utilization of the developed framework.

2.2. Review of data clustering and data classi¯cation


To cluster retail stores by the revenue performance and further investigate the in°u-
ential factors, the clustering and feature selection for classi¯cation methods are com-
monly applied on analyzing POS data. Data clustering and data classi¯cation are the
two most popular data mining techniques. The clustering method is mostly known as
an unsupervised learning method because clustering is performed without available
information that is related to the membership of data items to predetermined labels.22
Soni and Ganatra explained that clustering is exploratory in nature and the objective is
to ¯nd structure in a dataset.23 The clustering process categorizes similar objects of
data into a group called a cluster. The similarity measure introduces the compactness
of clusters as a performance measure for clustering. Generally, clustering algorithms
are divided into two types: hierarchical clustering and partitioning clustering. The
hierarchical clustering approach analyzes the dataset into a hierarchy and typically
presents the results in dendrograms or trees. The partitioning clustering such as
k-means method divides large data into a smaller number of groups based on similarity.
To evaluate the clustering result, some cluster validation indices are used such as
accuracy, rand index, entropy, purity, and so on. Besides, Kou et al. employed multi-
criteria decision making (MCDM) method to evaluate clustering performance.24
Contrary to data clustering, data classi¯cation methods give classes or labels to
new data records based on the existing data.22 There are two steps in data classi¯-
cation. The ¯rst step is to create a model with the given data, also known as the
training data. The second step is to use the created model to predict the class label
for the incoming unknown data based on the selected features. The complexity of
these steps depends on the number of features to use as classi¯ers. Some popular
classi¯cation methods before deep learning methods such as decision tree (DT),
arti¯cial neural network (ANN), k-nearest neighbor (KNN), and support vector
machines (SVM) have been used widely,25 and recently emerging deep learning
890 C.-L. Yang & T. P. Q. Nguyen

method is popular.26 Commonly, clustering and classi¯cation are implemented sep-


arately with di®erent models. Some researchers combined clustering and classi¯ca-
tion to improve the performance of the target task. For instance, some studies used
clustering to improve classi¯cation results such as (1) combining k-means with
decision tree learning,27 (2) clustering ensemble with SVM,28 (3) k-means with
SVM and regression,29 (4) multiple clustering approaches such as k-means, fuzzy
c-means, and hierarchical clustering with SVM and ANN,30 and (5) ensemble clas-
si¯cation-based supervised clustering (ECSC).31 Besides, some research works con-
sidered applying classi¯cation methods to cluster the classi¯ed results. Those studies
include a combination of k-means and SVM in supervised k-means clustering method
(SSVM),32 and integration of SVM classi¯er with multi-objective genetic algorithm
(MOGA) clustering.33 Moreover, multiple studies try to perform clustering and
classi¯cation simultaneously or sequentially in one model to utilize both data mining
methods. Cai et al. proposed a complicated multi-objective simultaneous learning
framework for clustering and classi¯cation (MSCC) in which both clustering and
classi¯cation models can rely on the clusters' centers.34 Bian et al. proposed a joint
clustering and classi¯cation technique for remote-sensing images with bipartition-
based k-means and SVM classi¯er combined.35 To simply present the results or
clustering and classi¯cation, Yang and Quyen proposed a sequential clustering and
classi¯cation data analysis framework to consider clustering one dataset and classi-
fying the clustering result by another dataset to reveal the correlation between the two
datasets.11 To handle the multi-objective optimization, in literature, multi-
objective genetic algorithm (MOGA) is suggested. There are multiple versions of
MOGA such as vector evaluated genetic algorithm (VEGA),36 non-dominated sorting
genetic algorithm (NSGA),37 fast and elitist non-dominated sorting genetic algorithm
(NSGAII),38 and so on. Due to superiority of multi-objective optimization, in Yang
and Quyen's work, a fast and elitist non-dominated sorting genetic algorithm
(NSGAII) was used to combine clustering with classi¯cation in a learning framework to
ensure objective functions of clustering and classi¯cation complement each other.
Although NSGAII-SCC has the straightforward framework of the sequential
approach and promising performance, how to conclude the relationships between the
results of clustering and classi¯cation based on the multiple solutions generated from
the Pareto front is needed for POS data analysis. In this research, TOPSIS and the
stepwise regression analysis were applied to extend the original NSGAII-SCC
framework to resolve the solution selection issue. The detailed information regarding
the extension of TOPSIS and the stepwise regression analysis can be found in the
following section.

3. Methodology
In this study, sequential clustering and classi¯cation are utilized to analyze the
performance of a retail chain based on the store sales patterns. The framework of
sequential clustering and classi¯cation using a fast and elitism non-dominated
Sequential Clustering and Classi¯cation Approach 891

Fig. 1. Framework of NSGAII-SCC on retail POS data with the stepwise regression analysis for feature
selection and TOPSIS for solution selection.

sorting genetic algorithm (NSGAII-SCC)11 for POS data analysis is illustrated in


Fig. 1. Essentially, there are three stages in this framework. In the ¯rst stage, the retail
POS data are processed to identify and separate data features into two di®erent
datasets: sales dataset (Q dataset) for clustering and store features dataset (X dataset)
for classi¯cation. The second stage performs clustering and classi¯cation sequentially
on the two datasets which are identi¯ed in the ¯rst stage. Herein, clustering on the Q
dataset is ¯rst performed to exploit the data patterns. Then, the class labels obtained
from the clustering result are used for training the classi¯er on the X dataset. Besides,
NSGAII is employed in the second stage to iteratively search for the optimal solution.
In addition, compared with the original NSGAII-SCC,11 an extension of data analysis
was proposed in this stage to evaluate the feature mechanism by stepwise regression
method. Essentially, the chromosomes of each generation of sequential clustering and
classi¯cation are recorded. Then, a stepwise regression model is applied to eliminate
the redundant and non-signi¯cant features for classi¯cation based on the multiple
solutions. Finally, the third stage analyzes the output results which reveal the data
patterns and identify the correlated features to the sale patterns.

3.1. Stage 1: Data features identi¯cation


POS data contain transactional records of sales which usually consist of two parts of
information. One is sale records containing date, time, items, transaction number,
prices, and so on, as the Q dataset mentioned in Fig. 1. Another part of data is
information regarding the store features such as store identi¯cation number, store
geometry size, and region information regarding the stores as X dataset mentioned in
Fig. 1. These data are saved in a POS system and are retrievable to analyze sales
patterns, pro¯t, number of customers, and others.
The sale transaction data are usually recorded in time series data format in POS.
The data can be aggregated daily, weekly, monthly, or annually based on the re-
search purpose. In this study, the sales data are aggregated monthly by store iden-
ti¯cation number. Please note that the aggregation can be conducted in di®erent
892 C.-L. Yang & T. P. Q. Nguyen

Fig. 2. Illustration of identifying data features for clustering and classi¯cation of retail POS data.

time durations such as daily or weekly without losing the generality. For example, in
our case study, the Q dataset was aggregated as the total sales transactions per
month of 54 retail bakery stores which sold more than 500 products in each store.
The second dataset X can include store characteristics such as store geometry size,
location of the store, and surrounding facilities nearby which the manager might be
interested in ¯nding the association against the sales performance of the store. These
two datasets Q and X will be used for clustering and classi¯cation, respectively, for
investigating store features that correlate to the sales performance under certain
groups of stores in terms of their sales performance. The process to identify data
features for each dataset is illustrated in Fig. 2.

3.2. Stage 2: Sequential clustering and classi¯cation based on NSGAII


NSGAII-SCC, a sequential clustering and classi¯cation framework, proposed in
Ref. 11 can be applied after separating the POS data into two Q and X datasets.
Figure 3 illustrates the NSGAII-SCC framework with the proposed stepwise re-
gression and TOPSIS methods. First, the initial population, which consists of two
types of chromosomes, one for clustering and another one for classi¯cation, is gen-
erated. Feature selection is processed based on the chromosome design. Data clus-
tering is ¯rst implemented based on the selected features on the Q dataset. The class
labels obtained from the clustering result are then used to train the classi¯er. A
multi-objective function, which considers the mean square error (MSE) and accuracy
as clustering's and classi¯cation's performance measures, respectively, is calculated
Sequential Clustering and Classi¯cation Approach 893

Fig. 3. Illustration of sequential clustering and classi¯cation data analysis framework with stepwise
regression feature selection and determination of number of cluster by TOPSIS.

for each chromosome. Then, genetic operations such as selection, crossover, and
mutation on feature chromosomes are then applied to search for the solutions for
clustering and classi¯cation. Similar to the original NSGAII method which was
proposed by Deb et al.,38 non-dominated sorting and crowding distance are com-
puted to select populations for the next generation. This process of the genetic
algorithm (GA) will repeat until the stopping criterion is met. The output solutions
on the Pareto front represent the solution candidates. In addition, the chromosomes
of each generation are recorded for stepwise regression analysis to study factors on
classi¯cation accuracy. The TOPSIS was then applied to select the solution for
concluding the analysis. The details of NSGAII-SCC with stepwise regression feature
selection and TOPSIS will be described in the following subsections.

3.2.1. Chromosome representation


Xue et al.39 proposed a multi-objective-based NSGA for feature selection. In their
method, a chromosome is encoded as a vector of real numbers in the range of [0,1]
where the length of the vector is the number of features in the dataset. To identify
which features are selected, the chromosome is transformed to a binary vector in
which gene 1 denotes the selected feature while gene 0 represents an unselected
feature. This study employs a similar approach in chromosome design. A chromo-
some is coded as a string of binary numbers in which each bit represents one data
feature. Therefore, the length of a chromosome depends on the number of features in
a dataset. For instance, in this study, clustering chromosomes are expressed as 12 bit-
strings to correspond to 12 features in the sales dataset Q. One clustering chromo-
some sample is f010010110011g. In this case, the genes which contain the value f1g
appear on the 2nd, 5th, 7th, 8th, 11th, and 12th positions of the chromosome mean
that the corresponding features on these positions in the sales dataset Q will be
selected to perform clustering. This concept of chromosomes representation is ap-
plied similarly on the feature dataset X for classi¯cation.
894 C.-L. Yang & T. P. Q. Nguyen

3.2.2. Sequential clustering and classi¯cation


Feature selection
The feature selection mechanism is conducted by designing a chromosome in GA to
indicate attributes that will be used for clustering and classi¯cation. The selected
attributes from the sales dataset Q indicated by the chromosome are trained using a
clustering method to reveal hidden data structures. The clustering result for each
store is labeled in the store features dataset X. The selected attributes in dataset X
indicated by the chromosome were used as inputs to classify the clustering labels for
the classi¯cation process.
Sequential clustering and classi¯cation
After selecting data features from the chromosome set, clustering and classi¯cation
algorithms are performed sequentially as shown in Fig. 3. The selection of clustering
and classi¯cation methods in this combination model is an important research
question. This study conducted a prior experiment on the given POS dataset based
on the two clustering methods (hierarchical and k-means) and four classi¯cation
methods, i.e., DT, ANN, KNN, and SVM, mentioned in the literature review in
Sec. 2.2. The prior experiment result will be shown in Sec. 4.2.
Fitness function
The NSGAII-SCC framework assesses the performance quality of both clustering and
classi¯cation tasks. For clustering, MSE is used to evaluate the compactness of
clusters. Similarly, the classi¯cation performance is measured by the prediction ac-
curacy (acc). On the one hand, the clustering task aims to minimize MSE which
means that the clusters are more compact. On the other hand, classi¯cation seeks to
maximize acc to increase prediction accuracy. To be consistent with the clustering
performance index, the reciprocal of acc, which is denoted as 1/acc, is used to
measure the performance of classi¯cation. Therefore, the ¯tness function now con-
siders minimizing both MSE and 1/acc.

3.2.3. Genetic operations


To produce the next generation in genetic algorithm, genetic operations are conducted
in the next steps. In common, genetic operations consist of selection, crossover, and
mutation. This study follows the design of original NSGAII38 for these tasks. For in-
stance, the selection operation uses the tournament method to select chromosomes from
the population. Crossover is conducted based on the conventional one-point crossover.
Mutation operation changes one random binary gene from j1j to j0j or vice versa.

3.2.4. Non-dominated sorting


Non-dominated sorting procedure is a crucial task to evaluate the solutions of
clustering and classi¯cation. The elitism non-dominated sorting combines popula-
tions so that the solutions are not trapped in a local optimum. The o®spring chro-
mosomes, which are reproduced after implementing genetic operations, are applied
Sequential Clustering and Classi¯cation Approach 895

Fig. 4. Non-dominated sorting procedure of the NSGAII-SCC framework.

to the sequential clustering and classi¯cation algorithm to obtain the ¯tness values.
Then, the procedure follows the original algorithm of Deb et al.38 Parent chromo-
somes are combined with o®spring chromosomes to conduct non-dominated sorting
and calculate crowding distance to select the next generation. Figure 4 illustrates the
non-dominated sorting procedure of the NSGAII-SCC framework.11
The process of GA will repeat until the stopping criterion is met. The output
solutions on the Pareto front represent the solution candidates.

3.2.5. Feature selection by stepwise regression analysis


After the optimization process is ¯nished, the algorithm can obtain the optimal
results of the clustering and classi¯cation based on their ¯tness values. However,
multiple solutions generated by NSGAII-SCC technique should be investigated to
¯nd the correlated factors from the X dataset that a®ect the revealed patterns. Thus,
this step focuses on exploring the in°uence factor on classi¯cation accuracy based on
the feature selection mechanism.
First, the chromosomes for classi¯cation of each generation are recorded during
the iterative process of the sequential clustering and classi¯cation. Then, stepwise
regression15 is utilized to analyze the in°uent features. The stepwise regression
method can detect each variable to eliminate the non-signi¯cant features and keep
the appropriate ones for the ¯t model. Suppose that the population size and number
of generations are 50 and 100, respectively. A total of 5000 chromosomes for clas-
si¯cation are recorded. To build the stepwise regression models, the acc is de¯ned as
a dependent variable while the features on the X dataset are de¯ned as independent
variables. This technique takes a step-by-step approach to selecting variables. The
process of adding or eliminating independent variables is implemented by using the
statistical signi¯cance of the variables. A ¯nal model is adopted when no more
independent variables can be added or deleted from the stepwise model. Figure 5
illustrates the framework of the stepwise regression mechanism on the chromosomes
for classi¯cation. As can be seen, the numbers of 1/acc are dependent variables of the
stepwise regression model while the features of X dataset are considered as the
independent variables. After performing stepwise regression, the features selected
will be kept for reasoning the relationship between X and Q datasets.
896 C.-L. Yang & T. P. Q. Nguyen

Fig. 5. The framework of stepwise regression with feature elimination.

3.2.6. Determination of number of cluster by TOPSIS


To determine the number of cluster k based on the Pareto front set, this study
integrates the TOPSIS method to select the solution which has the largest
relative closeness considering the weights of di®erent criteria. Essentially, TOPSIS is
a multi-criteria decision analysis method to select the alternative that has the
shortest geometric distance from the positive ideal solution and the longest geometric
distance from the negative ideal solution.40 A set of alternatives is compared by (1)
identifying weights for each criterion, (2) normalizing scores for each criterion, and
(3) calculating the geometric distance between each alternative and the ideal solu-
tion, which has the best score in each criterion.
In addition to the two objective functions MSE and 1/acc, two more criteria are
represented by the number of selected features in the non-dominated solutions. For
each solution, the frequency proportion of the clustering chromosomes (FR-clus) is
calculated by dividing the number of selected features by the total features in the
clustering chromosome. Similarly, the frequency proportion of the classi¯cation
chromosomes (FR-class) is the number of selected features divided by the total
features in the classi¯cation chromosomes. The objective of considering FR-clus and
Sequential Clustering and Classi¯cation Approach 897

FR-class is to ¯nd out the solutions which have a relatively smaller number of features
used based on the principle of Occam's razor in data science (simple is better).41 After
assigning weights for MSE, 1/acc, FR-clus, and FR-class, TOPSIS method is used to
determine the relative closeness of each solution. Then, the number of cluster associ-
ated with the solution which has the largest relative closeness will be determined as the
cluster of X dataset. This determination is crucial for further analyzing the relationship
between store performance and store features on POS data.

3.3. Stage 3: Result analysis


After determining the number of cluster, the further analysis can be conducted to
investigate the optimal solution for POS analysis. Essentially, the constructed
clusters can show the sales pattern in each group of stores. A similar seasonal e®ect
on revenue can be easily revealed by the clustering results. The classi¯cation result
will present store features that are correlated to the sales pattern for each group of
stores. This analysis will be bene¯cial to improve sales performance and for decision-
making purposes. In the next section, a real-world case study will be used to present
the clustering and classi¯cation analysis combined.

4. Case Study
4.1. Data description and parameter setting
In this case study, POS data are collected from a bakery franchisee in Hangzhou,
China. The data are divided into two sets, one for clustering and the other one for
classi¯cation. The ¯rst set is the sales dataset Q that contains aggregated monthly
revenue by store. There are 54 bakery stores and each store sells more than 500
products in this bakery franchisee. The second portion is the store features dataset X
that comprises characteristics information related to each store. The factors included
in the store features dataset are as follows:

. Store size: Store size by square meter.


. Floor: The Number of °oors in the stores (one, two, or three).
. Bus: The Number of bus stations nearby the bakery stores.
. Facility: Public facilities surrounding the stores such as MRT station, school,
government o±ce, department store, convenience store, residential area, and
tourist attraction spot. This is an attributing factor: 1 means a public facility
exists and 0 means there is no public facility.
. Location: District location of the stores. The districts are symbolized by the
numbers 1 to 10 as shown in Table 1.
. Population: Population in the districts where the stores are located at.
. Population density: Measurement of population per unit area.

User-de¯ned parameters for the proposed NSGAII-SCC are set up as follows:


number of generations ¼ 100, population size ¼ 100, crossover rate ¼ 0:8, mutation
898 C.-L. Yang & T. P. Q. Nguyen

Table 1. Districts names and districts symbols of


POS data.

Symbol District name Symbol District name

1 Shang Cheng 6 Bin Jiang


2 Xia Cheng 7 Xiao Shan
3 Jiang Gan 8 Yu Hang
4 Gong Shu 9 Fu Yang
5 Xi Hu 10 Jia Xing

rate ¼ 0:1, weighting component ¼ 1:2 and chromosome selection ¼ 0:1. The num-
ber of generations is used as the stopping criteria for GA. The number of clusters (k)
ranges from 2 to 10. To evaluate the performance of the NSGAII-SCC framework on
the POS data, the results will be compared with the conventional sequential clus-
tering and classi¯cation (SCC) without NSGA. As mentioned in the previous section,
two validation indices (MSE and 1/acc) are used to measure the results. The clus-
tering result is validated by the MSE index and the classi¯cation performance is
evaluated by 1/acc.

4.2. Prior experiment


Several data mining techniques can be used to perform clustering and classi¯cation.
This prior experiment aims to investigate the best combination of clustering and
classi¯cation methods for this POS data analysis. For clustering, k-means and hi-
erarchical agglomerative clustering are selected. Regarding classi¯cation methods,
DT, ANN, KNN, and SVM are chosen. The prior experiment is conducted to the
given POS data based on the conventional SCC method. Similar to the NSGAII-SCC
framework, the conventional SCC ¯rst performs clustering on the sales data Q and
gets data labels from the clustering result to conduct classi¯cation on store features
dataset X. The experiment is repeated 20 times and the average MSE and 1/acc
results are recorded. The clustering results of k-means and hierarchical

Table 2. Results of clustering and classi¯cation based on SCC.

k MSE (*108 Þ 1/acc

Hierarchical k-means ANN KNN SVM DT

2 18.33 14.01 1.30 1.22 1.21 1.11


3 9.46 8.20 1.32 1.35 1.40 1.13
4 8.64 5.40 1.29 1.44 1.66 1.18
5 8.26 4.02 1.40 1.53 1.96 1.19
6 7.90 3.48 2.32 1.65 1.68 1.25
7 3.53 2.74 1.97 1.77 2.16 1.49
8 3.02 2.29 2.27 2.03 1.92 1.52
9 2.80 2.01 2.55 1.98 1.80 1.60
10 1.70 1.67 2.14 2.35 1.87 1.66
Sequential Clustering and Classi¯cation Approach 899

Fig. 6. Comparison of clustering methods based on bakery POS data.

agglomerative clustering are compared to select the best one. Similarly, ANN, KNN,
SVM, and DT results are compared to choose the method that can give the best
results on the given POS dataset. The results are shown in Table 2 and plotted in
Figs. 6 and 7.
Figure 6 shows that k-means performs better than hierarchical clustering in all
values of k: For two to six clusters, there is a signi¯cant di®erence in MSE between
k-means and the hierarchical method. However, the di®erence decreases when the
number of clusters exceeds six. Based on this result, k-means is selected to perform
the clustering task on the given POS dataset.
Similarly, Fig. 7 illustrates the comparison of four classi¯cation methods using the
POS data. DT performs better than other methods in terms of 1/acc (smaller is
better). In most cases, the 1/acc index rises steadily when the number of clusters
increases. This indicates that the classi¯cation technique becomes less accurate when
there are more class labels (k).

Fig. 7. Comparison of classi¯cation methods based on bakery POS data.


900

Table 3. Solution of ¯rst Pareto front using NSGAII-SCC with their corresponding features.

k MSE ð108 Þ 1/acc Chromosome for clustering Chromosome for classi¯cation Number of selected features

Clustering Classi¯cation

2 8.051 1.019 010110000011 0111100001111 5 8


C.-L. Yang & T. P. Q. Nguyen

2 7.567 1.038 110010001011 1111010001111 6 9


2 7.357 1.059 100011000001 1101100111111 4 10
3 4.894 1.059 100000110010 1011011111110 4 10
4 4.366 1.104 010010010101 1110011000110 5 7
4 3.580 1.122 100010010011 1000001011010 5 5
5 2.022 1.181 010010110011 1101010000001 6 5
6 1.372 1.217 100010000101 1001111000110 4 7
6 1.169 1.270 100001100011 1100101010001 5 6
7 0.598 1.317 101010000011 1011100110111 5 9
8 0.527 1.409 100010010001 1111100010011 4 8
9 0.401 1.421 101010100010 1111001011111 5 10
10 0.392 1.503 100010001011 1111011110101 5 10
10 0.267 1.616 100100010011 1011100000110 5 6
Sequential Clustering and Classi¯cation Approach 901

Based on the results of this prior experiment, this study selects k-means and DT
to perform clustering and classi¯cation tasks, respectively, for the POS data using
the NSGAII-SCC framework. The results of NSGAII-SCC are compared with SCC
to evaluate the performance of the proposed framework.

4.3. Comparison of NSGAII-SCC and SCC results


The proposed NSGAII-SCC framework provides a set of non-dominated solutions,
also known as the Pareto front. In this experiment, the number of clusters k is
assigned from 2 to 10. For each value of k, the clustering task is performed to obtain
the corresponding label L. L is then used to train the classi¯er in the classi¯cation
task. As the result, the NSGAII-SCC provides the Pareto frontier with di®erent
values of k. Please note that, for each value of k, there are several non-dominated
solutions, as shown in Table 3. The average result of NSGAII-SCC for each value of k
is taken to make a comparison with the conventional SCC.
In Table 3, the Pareto front solutions with their corresponding chromosomes
which are used to identify the selected features are exhibited. MSE and 1/acc are the
validity indices of clustering and classi¯cation, respectively. Smaller values of MSE
and 1/acc indicate the better result. Binary chromosome represents the data features
\1" which means the feature is included in the training data while \0" signi¯es vice
versa. As shown in Table 3, the binary indicates in chromosome specify the used
features for conducting clustering and classi¯cation. Obviously, during the GA
process, the multiple combinations of features selections will be evaluated. Only the
mutual (also called Pareto) solutions will be presented as the candidates.
Table 4 shows the comparison of NSGAII-SCC and SCC frameworks in terms of
MSE and 1/acc. The clustering result of NSGAII-SCC is signi¯cantly better than
SCC due to the smaller values of MSE. Also, NSGAII-SCC has a smaller 1/acc than
SCC which means NSGAII-SCC has better classi¯cation accuracy. Based on the
paired t-test with a 95% con¯dence interval on comparing NSGAII-SCC with SCC,

Table 4. Results of NSGAII-SCC and SCC frame-


works based on bakery POS data.

k NSGAII-SCC SCC

MSE (*108 ) 1/acc MSE (*108 ) 1/acc

2 7.66 1.04 14.01 1.11


3 4.89 1.06 8.20 1.13
4 3.97 1.11 5.40 1.18
5 2.02 1.18 4.02 1.19
6 1.27 1.24 3.48 1.25
7 0.60 1.32 2.74 1.49
8 0.53 1.41 2.29 1.52
9 0.40 1.42 2.01 1.60
10 0.33 1.56 1.67 1.66
mean 2.41 1.26 4.87 1.35
902 C.-L. Yang & T. P. Q. Nguyen

Fig. 8. Solutions of ¯rst Pareto front using NSGAII-SCC and SCC methods based on bakery POS data.

the statistical result con¯rmed the superior of NSGAII-SCC against SCC in terms of
smaller MSE and 1/acc.
Figure 8 further illustrates the solutions of NSGAII-SCC and SCC methods based
on the bakery POS data. In Fig. 8, the red line with the square indicator shows the
solutions of the ¯rst Pareto front using NSGAII-SCC, while the blue line with the
cycle indicator shows the solutions of the ¯rst Pareto front using SCC. The red line
with the square indicator is closer to the original coordinate (left-bottom corner)
which means NSGAII-SCC can obtain the smaller MSE and 1/acc simultaneously.
Once again, this result shows that the proposed framework NSGAII-SCC outper-
forms SCC in terms of better clustering and classi¯cation performances. It also
means NSGAII-SCC can search the better solutions of clustering and classi¯cation
when conducting POS analysis for revenue performance investigation.

4.4. Solution selection for the number of clusters k


As mentioned in the methodology section, TOPSIS method was applied here to
determine the number of k based on the solutions generated sequential clustering and
classi¯cation. Table 5 lists the comparison of the number of clusters by the TOPSIS
method calculation process. The ¯rst set contains bene¯t attributes for maximization
(more is better). Another set comprises negative attributes for minimization (less is
better). In this study, acc is selected as the bene¯t attribute while MSE, FR-clus, and
FR-class are the negative attributes. All attributes' scores are normalized and pre-
sented in Table 5. TOPSIS method calculates the relative closeness to the ideal
solution and then ranks the solutions. As can be seen in Table 5, the ¯rst ranked
solution is selected, which is the solution with k ¼ 5. It also means that while cluster
Sequential Clustering and Classi¯cation Approach 903

Table 5. Comparison of number of cluster (k) by TOPSIS method.

Weight 0.3 0.2 0.2 0.3 Relative closeness Ranking


k MSE FR-clus FR-class acc

2 1.000 0.417 0.615 1.000 0.438 11


2 0.938 0.500 0.692 0.949 0.436 12
2 0.911 0.333 0.769 0.898 0.435 13
3 0.594 0.333 0.769 0.898 0.564 10
4 0.527 0.417 0.538 0.792 0.585 8
4 0.426 0.417 0.385 0.752 0.643 5
5 0.225 0.417 0.462 0.629 0.720 1
6 0.142 0.333 0.538 0.559 0.713 2
6 0.116 0.417 0.462 0.465 0.688 3
7 0.042 0.417 0.692 0.387 0.662 4
8 0.033 0.333 0.615 0.250 0.625 6
9 0.017 0.417 0.769 0.234 0.611 7
10 0.016 0.417 0.769 0.129 0.581 9
10 0.000 0.417 0.462 0.000 0.564 10

revenue performance of stores, ¯ve groups of stores can be established for better
analysis due to the relevant clustering and classi¯cation results combined in Pareto
front solutions.

4.5. POS data analysis


According to the result from the TOPSIS method, the solution with ¯ve clusters
(k ¼ 5) is selected for POS data analysis. In this case, 54 bakery franchise stores are
grouped into ¯ve clusters by the monthly revenue. Figure 9 plots the sales revenue of

Fig. 9. Monthly revenue of each group of stores.


904 C.-L. Yang & T. P. Q. Nguyen

each cluster to show the sales patterns among stores. Similar sales patterns in one
cluster can be recognized. For example, most of the stores are grouped into Cluster 4
which had diminished revenue in the 2nd, 6th, and 8th months. Meanwhile, Cluster 5
has only one store that has outstanding revenue each month. Cluster 3 contains
stores with a signi¯cant decrease of revenue at the 7th month which shows the unique
patterns than other groups. Cluster 1 and Cluster 2 seem not much distinct from
each other but Cluster 2 has relatively larger variation. Based on this clustering
analysis, the groups of stores with di®erent sale patterns can be revealed for further
investigation.
The chromosomes which represent the selected features help us to identify im-
portant features for both clustering and classi¯cation. Following the selected solution
at k ¼ 5, the clustering chromosome is a string of f010010110011g. This indicates
that six features, among a total of 12 features in the revenue dataset, are chosen as
the representative features to perform clustering. These six selected features are the
revenue of the 2nd, 5th, 7th, 8th, 11th, and 12th months. These six months are
signi¯cant in di®erentiating retail stores shown in Fig. 9 because they represent the
critical sales performance when conducting a comparison of all stores. For instance,
the 7th feature contributes to the signi¯cant pattern characteristic in Cluster 3. The
2nd feature is signi¯cant in most of the clusters. Therefore, using the selected features
can predict clustering results that lead closer to the result while using all data. It also
means that the proposed framework provides not only the mechanism of determining
cluster number considered clustering and classi¯cation result but also selecting the
features which are the most in°uential for store clustering.
Once the clustering method obtains the store labels, the store features are clas-
si¯ed using the clustering labels as classi¯cation targets. Similar to the process of
exploring selected features for data clustering, the chromosome for classi¯cation
corresponding to the optimal value of k determined by TOPSIS is analyzed. Besides,
the stepwise regression method is applied on the classi¯cation chromosomes to
eliminate the redundant and non-signi¯cant features to the acc. The signi¯cance
level is set as 0.1. Finally, the ¯ve most correlated factors, i.e., store size, department
stores, residential area, location, and population density, can be used to classify the
store cluster. The correlations between sales revenue and the associated store in-
formation for ¯ve store clusters are described in Table 6. As can been seen, store size,
department stores, residential area, location, and population density of each store
cluster have di®erent magnitude levels. For example, Cluster 5 was identi¯ed as the
high sales revenue store which happens to have a larger store size in a high popu-
lation density region. However, the stores with bigger °oor plans do not guarantee
tremulous sales. There is no negligible °uctuation in sales in Cluster 4; even the
stores in that cluster have middle store size. Based on the analysis result, the de-
partment stores nearby turn out to be a signi¯cant factor to identify the sale per-
formance in Cluster 4. Moreover, Cluster 3 can be recognized as a special cluster
which is located in the lowest population density region #8 and #10 and relatively
low sales in month 7th. This analysis points out the stores which were located near
Table 6. Correlations between sales revenue and the associated store information for 5 store clusters derived by NSGAII-SCC.

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

Sales revenue Medium sales Medium sales and Sales decrease sharply Low sales to medium sales High sales
more °uctuating in month 7th
Store size (m2 ) 70 to 78.5 44 to 48 and 79 to 130  44 48 to 70  120
Location (District No. in Table 1) #5 #2, #7 #8, #10 #1, #3, #4, #9 #6
Department stores nearby (unit) 0 2 0, 1 0, 1, 3 1
Residential area (0/1) 0 1 0 0 1
Population density (persons/km2 ) 3000 to 3200 800 to 1000  650 900 to 1000  10000
9900 to 10000 1300 to 1400 1600 to 1700
Sequential Clustering and Classi¯cation Approach
905
906 C.-L. Yang & T. P. Q. Nguyen

elementary schools in suburban areas. It can be easily related to the low sales during
summer because students do not need to go to school during the break. Please note
that although the analysis seems intuitive, the features (factors) mentioned above
were found out by the proposed algorithm without any human intervention. The
analysis afterward actually shows that NSGAII-SCC can cluster the stores by sales
revenue and classify relevant features which correlate signi¯cantly to the clustering
result (monthly revenue). Last but not the least, the result shown in Table 6
demonstrates the useful information that came out by the framework for the retailing
store manager to appraisal the performances of the store clusters based on the
provided POS data.

4.6. Discussion
In this research, the NSGAII-SCC was compared with the conventional SCC method
which has been widely used for data mining. Without considering the complexity of
the combination of clustering and classi¯cation, SCC has straightforward analytics
which considers clustering and classi¯cation as the separated tasks. For the analysis
scenarios of retail revenue performance, the determination of the number of store
clusters and the feature selection for both clustering and classi¯cation are con-
founded together. It means that for any number of clusters determined ¯rst, the
signi¯cant features for clustering and classi¯cation might be di®erent to guarantee
the analysis's e®ectiveness. Also, even we can select features ¯rst for clustering and
classi¯cation, the performance under the di®erent number of clusters might be di-
verse. By using the multi-objective framework with GA searching operations, the
solutions on the Pareto front provide a relatively smaller number of solution can-
didates for further investigation. The solutions also have non-dominated char-
acteristics to ensure the e®ectiveness of clustering and classi¯cation combined.
Without a doubt, NSGAII-SCC needs more computational e®ort because of
the iterative processes under the GA framework. Generally speaking, NSGAII-
SCC needs more than 20–100 times of computational time than SCC depending
on how the stop criteria of GA is set. However, this computation framework can
be executed automatically under the database platform as the stored procedure,
or the analytics platform with data mining subroutines. The analytics framework
of NSGAII-SCC is paid o® without exhaustively searching all combinations with
a number of ðp  1Þ  ð2n  1Þ  ð2m  1Þ, where k is the largest number of clusters;
n is the number of features for clustering; m is the number of features for classi¯-
cation. For example, in the cast of POS study shown in Table 3, the NSGAII-SCC
can generate Pareto front without going through ð10  1Þ  ð212  1Þ  ð213  1Þ ¼
301; 879; 305 combinations of clustering and classi¯cation with k=2–10. This meta-
heuristic method proposed by NSGAII-SCC can still be considered computational
e±ciency.
The setting of weights in the TOPSIS method also provides the capability of
managerial adjustment. If the manager of a retail store has more concerned about the
Sequential Clustering and Classi¯cation Approach 907

accuracy of classi¯cation results, the weight of acc can be adjusted higher. Then, the
ranking of choosing the number of store clusters will be tended to more favorable to
the one which has better classi¯cation accuracy. Similarly, if the manager would like
to ensure the clustering has better compactness, the weight of MSE can be set higher
comparing with other decision variables.
In addition, the stepwise regression method is employed to eliminate the redun-
dant and non-signi¯cant features that are correlated to the store patterns. The
removed features may depend on the signi¯cance level. If the p-value of the t-test in
the stepwise model is greater than the signi¯cance level for removal, the feature will
be eliminated from the stepwise model. Thus, setting the higher value for signi¯cance
level may lead to reduce the probability of removing a feature from the stepwise
model. In contrast, the signi¯cant level can be set as a small value to increase the
probability of features' removal.

5. Conclusion and Future Research Direction


Analyzing retail POS data to understand the store's performance is critical for retail
management. By investigating the pattern of sales, the management can perform
better decision-making on handling the inventory or adjust the list of items. In this
research, the NSGAII-SCC framework was applied to discover the hidden sales
pattern of stores in terms of sales revenue and further reveal the correlation between
the sales revenue and store characteristics by classi¯cation method. NSGAII-SCC
framework constructs the multi-objective optimization to maintain clustering and
classi¯cation results in terms of compactness of clusters and classi¯cation accuracy,
respectively. Through the recursive procedures of GA operation, the chromosome
settings can represent the feature selection evolved to search for the optimal clus-
tering and classi¯cation result and select the correlated data features. The solutions
of the Pareto front can be plotted to show the population of solutions found. Three
analysis operations including (1) the determination of the number of store clusters,
(2) the feature selection in dataset Q for clustering store revenue performance, and
(3) the feature selection in dataset X for classifying the store cluster can be per-
formed at once by the proposed NSGAII-SCC framework.
In order to conclude the multiple solutions from Pareto sets generated by
NSGAII-SCC, this work integrates the stepwise regression and TOPSIS method to
determine the signi¯cant features from store characteristics and the number of
cluster in store revenue, respectively. The integration of the stepwise regression and
TOPSIS can enhance the usability of NSGAII-SCC to provide the post-data anal-
ysis. This integration also extends the NSGAII-SCC framework to resolve the so-
lution and feature selection problem when the extreme large number of solutions
might exist when applying it on the real-world data.
To demonstrate the application of the NSGAII-SCC framework, the POS data of
a bakery retail chain are used as a case study to evaluate the proposed framework.
First, the revenue dataset derived from POS data was denoted as the sales dataset Q
908 C.-L. Yang & T. P. Q. Nguyen

that measures stores' performance. Then, the relevant store information for each
store is de¯ned as the store features' dataset X for the classi¯cation process. The
proposed data analysis framework is substantial to investigate the store sales per-
formance and reveal the correlated factors. This analysis can be performed by
NSGAII-SCC automatically and the result has signi¯cant usage for improving sales
and developing business strategies. Additionally, the proposed framework can ex-
plicit the best number of clusters account for the better clustering and classi¯cation
performance using POSSIS. Finding out factors that correlate to the sales perfor-
mance based on the stepwise method can further provide managerial meaning when
studying the POS data.
For future research, there are multiple directions to improve the analysis and the
algorithm framework. First, this result can be compared with other multivariate
analysis methods. Second, the algorithm framework can be integrated with the
standard database or ERP system to verify its performance. Third, a data pre-
processing technique can be developed to screen out unimportant features either in
the sales dataset or the store features' dataset to improve the computation e±ciency.

Acknowledgment
We appreciate the ¯nancial support from the Ministry of Science and Technology
of Taiwan, R.O.C. (Contract No. 106-2221-E-011-106-MY3) and the \Center for
Cyber-Physical System Innovation" from The Featured Areas Research Center
Program within the framework of the Higher Education Sprout Project by the
Ministry of Education (MOE) in Taiwan. We also thank Wang Jhan Yang Chari-
table Trust Fund (Contract No. WJY 2020-HR-01) and Vingroup Joint Stock
Company (Vingroup JSC) by Vingroup Innovation Foundation (VINIF) under
project code VINIF.2020.DA19 for their ¯nancial support.

References
1. A. G. Woodside and G. L. Waddle, Sales e®ects of in-store advertising, Journal of Ad-
vertising Research 15(3) (1975) 29–33.
2. Y. Ogawa, N. Tanabe, A. Honda, T. Azuma, N. Seki, T. Suzuki and H. Suzuki, Point-of-
purchase health information encourages customers to purchase vegetables: Objective
analysis by using a point-of-sales system, Environmental Health and Preventive Medicine
16(4) (2011) 239–246.
3. J. S. Zhu, POS data and your demand forecast, First Int. Conf. Information Technology
and Quantitative Management, 16–18 May 2013, Suzhou, China, pp. 8–13.
4. K. Sundararaman, J. Parthasarathi, G. S. V. Rao and S. N. Kumar, Baseline prediction of
point of sales data for trade promotion optimization, Communications and Information
Technology (ICCIT), 2012 Int. Conf., 2012, 26–28 June 2012, Hammamet, Tunisia,
pp. 17–20.
5. A. Banerjee and B. Banerjee, E®ective retail promotion management: Use of point of sales
information resources, Vikalpa 25(4) (2000) 51–60.
Sequential Clustering and Classi¯cation Approach 909

6. G. Kou, Y. Xu, Y. Peng, F. Shen, Y. Chen, K. Chang and S. Kou, Bankruptcy prediction
for SMEs using transactional data and two-stage multiobjective feature selection, Deci-
sion Support Systems 140 (2021) 113429.
7. M. Fisher and A. Raman, Using data and Big Data in retailing, Production and
Operations Management 27(9) (2018) 1665–1669.
8. M. Margaret Weber and S. Prasad Kantamneni, POS and EDI in retailing: An exami-
nation of underlying bene¯ts and barriers, Supply Chain Management: An International
Journal 7(5) (2002) 311–317.
9. P. Duchessi, C. M. Schaninger and T. Nowak, Creating cluster-speci¯c purchase pro¯les
from point-of-sale scanner data and geodemographic clusters: Improving category man-
agement at a major US grocery chain, Journal of Consumer Behaviour 4(2) (2004)
97–117.
10. T.-M. Choi, S. W. Wallace and Y. Wang, Big Data analytics in operations management,
Production and Operations Management 27(10) (2018) 1868–1883.
11. C.-L. Yang and N. T. P. Quyen, Data analysis framework of sequential clustering and
classi¯cation using non-dominated sorting genetic algorithm, Applied Soft Computing 69
(2018) 704–718.
12. M. Mahajan, P. Nimbhorkar and K. Varadarajan, The planar k-means problem is NP-
hard, Theoretical Computer Science 442 (2012) 13–21.
13. J. Cai, J. Luo, S. Wang and S. Yang, Feature selection in machine learning: A new
perspective, Neurocomputing 300 (2018) 70–79.
14. B. Uzun, M. Taiwo, A. Syidanova and D. Uzun Ozsahin, The technique for order of
preference by similarity to ideal solution (TOPSIS), in Application of Multi-Criteria
Decision Analysis in Environmental and Civil Engineering, eds. D. Uzun Ozsahin et al.
(Springer International Publishing, Cham, 2021), pp. 25–30.
15. B. Żogała-Siudem and S. Jaroszewicz, Fast stepwise regression based on multidimensional
indexes, Information Sciences 549 (2021) 288–309.
16. R. Croson and K. Donohue, Impact of pos data sharing on supply chain management: An
experimental study, Production and Operations Management 12(1) (2003) 1–11.
17. K. Stein, Point-of-sale systems for foodservice, Journal of the American Dietetic Asso-
ciation 105(12) (2005) 1861.
18. T. Li, G. Kou, Y. Peng and P. S. Yu, An integrated cluster detection, optimization,
and interpretation approach for ¯nancial data, IEEE Transactions on Cybernetics (2021)
1–14.
19. B. D. Williams, M. A. Waller, S. Ahire and G. D. Ferrier, Predicting retailer orders with
POS and order data: The inventory balance e®ect, European Journal of Operational
Research 232(3) (2014) 593–600.
20. T. Kashima, S. Matsumoto and H. Ishii, Recommendation method with rough sets
in restaurant point of sales system, in Proc. Int. MultiConf. Engineers and Computer Sci-
entists, Hong Kong, 2010, pp. 2018–2023, https://ieeexplore.ieee.org/document/9546664.
21. J. A. Aloysius, H. Hoehle, S. Goodarzi and V. Venkatesh, Big data initiatives in retail
environments: Linking service process perceptions to shopping outcomes, Annals of
Operations Research 270(1) (2018) 25–51.
22. P.-N. Tan, M. Steinbach, A. Karpatne and V. Kumar, Introduction to Data mining, 2nd
edn., What's New in Computer Science, (Pearson, NY, 2018).
23. N. Soni and A. Ganatra, Categorization of several clustering algorithms from di®erent
perspective: A review, International Journal of Advanced Research in Computer Science
and Software Engineering 2(8) (2012) 63–68.
24. G. Kou, Y. Peng and G. Wang, Evaluation of clustering algorithms for ¯nancial risk
analysis using MCDM methods, Information Sciences 275 (2014) 1–12.
910 C.-L. Yang & T. P. Q. Nguyen

25. S. B. Kotsiantis, I. D. Zaharakis and P. E. Pintelas, Machine learning: A review


of classi¯cation and combining techniques, Arti¯cial Intelligence Review 26(3) (2006)
159–190.
26. H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar and P.-A. Muller, Deep learning
for time series classi¯cation: A review, Data Mining and Knowledge Discovery 33(4)
(2019) 917–963.
27. C. Kaewchinporn, N. Vongsuchoto and A. Srisawat, A combination of decision tree
learning and clustering for data classi¯cation, Int. Joint Conf. Computer Science and
Software Engineering, 2011, pp. 363–367.
28. L. F. Coletta, N. F. da Silva and E. R. Hruschka, Combining classi¯cation and clustering
for tweet sentiment analysis, Intelligent Systems (BRACIS), 2014 Brazilian Conf., 2014,
IEEE, pp. 210–215.
29. X.-Y. Zhang, P. Yang, Y.-M. Zhang, K. Huang and C.-L. Liu, Combination of classi¯-
cation and clustering results with label propagation, IEEE Signal Processing Letters
21(5) (2014) 610–614.
30. V. Elyasigomari, M. S. Mirjafari, H. R. C. Screen and M. H. Shaheed, Cancer classi¯-
cation using a novel gene selection approach by means of shu®ling based on data clus-
tering with optimization, Applied Soft Computing 35 (2015) 43–51.
31. H. Xiao, Z. Xiao and Y. Wang, Ensemble classi¯cation based on supervised clustering for
credit scoring, Applied Soft Computing 43 (2016) 73–86.
32. T. Finley and T. Joachims, Supervised k-means Clustering, Department of Computer
Science Cornell University, Ithaca, NY, USA, 2008.
33. A. Mukhopadhyay, S. Bandyopadhyay and U. Maulik, Multi-class clustering of cancer
subtypes through SVM based ensemble of pareto-optimal solutions for gene marker
identi¯cation, PLoS One 5(11) (2010) e13803.
34. W. Cai, S. Chen and D. Zhang, A multiobjective simultaneous learning framework for
clustering and classi¯cation, IEEE Transactions on Neural Networks 21(2) (2010) 185–
200.
35. X. Bian, T. Zhang and X. Zhang, Combining clustering and classi¯cation for remote-
sensing images using unlabeled data, Chinese Optics Letters 9(1) (2011) 011002.
36. G. Kou, H. Xiao, M. Cao and L. H. Lee, Optimal computing budget allocation for the
vector evaluated genetic algorithm in multi-objective simulation optimization, Auto-
matica 129 (2021) 109599.
37. N. Srinivas and K. Deb, Muiltiobjective optimization using nondominated sorting in
genetic algorithms, Evolutionary Computation 2(3) (1994) 221–248.
38. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, A fast and elitist multiobjective genetic
algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation 6(2) (2002) 182–
197.
39. Y. Xue, Y. Tang, X. Xu, J. Liang and F. Neri, Multi-objective feature selection with
missing data in classi¯cation, IEEE Transactions on Emerging Topics in Computational
Intelligence (2021) 1–10, https://ieeexplore.ieee.org/document/9420459.
40. G. Kabir, M. Ahsan and A. Hasin, Framework for benchmarking online retailing per-
formance using fuzzy AHP and TOPSIS method, International Journal of Industrial
Engineering Computations 3(4) (2012) 561–576.
41. A. Blumer, A. Ehrenfeucht, D. Haussler and M. K. Warmuth, Occam's razor, Information
Processing Letters 24(6) (1987) 377–380.

You might also like