Second File To Upload
Second File To Upload
https://doi.org/10.1007/s12652-020-02015-w
ORIGINAL RESEARCH
Abstract
Data mining and in particular forecasting tools and techniques are being increasingly exploited by businesses to predict
customer behavior and to formulate effective marketing programs. Conventionally, customer segmentation approaches are
utilized when dealing with a large population of customers. Inspired by this idea, a new methodology is proposed in this
study to perform segment-level customer behavior forecasting. To keep the dynamic nature of customer behavior, customer
behavior is represented as a time series. Therefore, customer behavior forecasting is changed into a time series forecasting
problem. The proposed methodology contains two main components i.e. clustering and forecasting. In the clustering phase,
time series are clustered using time series clustering algorithms, and then, in the forecasting phase, the behavior of each
segment is predicted via time series forecasting techniques. The main objective is to predict future behavior at segment level.
The forecasting component also consists of a combined method exploiting the concept of forecast fusion. The combined
method employs a pool of forecasters both from traditional time series forecasting and computational intelligence methods.
To test the usefulness of the proposed method, a case study is carried out using the data of customers’ point of sale (POS) in
a bank. The results of the experiments demonstrate that the combined method outperforms all other individual forecasters in
terms of symmetric mean absolute percentage error (SMAPE). The proposed methodology can be correspondingly applied
in other areas and applications of time series forecasting.
13
Vol.:(0123456789)
H. Abbasimehr, M. Shabani
number of previous studies have thus far conducted seg- group. Moreover, time series clustering facilitates the
mentation using the RFM model (Akhondzadeh-Noughabi management of a large population of customers.
and Albadvi 2015). Besides, it is taken into account as a • The proposed methodology contains a time series fore-
widely-used technique for analyzing customer value (Heldt casting component. As stated in previous studies, no fore-
et al. 2019). casting model is best suited for every time series due to
To predict customer behavior using the RFM model in their intrinsic characteristics. Time series often encom-
a real-world situation, the first prerequisite is to collect passes both linear and non-linear components, so it is
the suitable data of past transactions. After obtaining the better to combine both linear and non-linear time series
required data of customers, they should be represented in forecasting techniques to enhance prediction accuracy.
such a way that addresses the problem at hand (e.g. fore- Therefore, a combined forecasting technique that aggre-
casting) in an effective manner. To better keep the dynamic gates the results of some linear and non-linear models.
nature of customer behavior, the data of customers is mod- The prominent advantage of the combined method is
eled as time series. Analyzing time series data poses some utilizing prior information about the performance of a
challenges including need to address seasonality of data, technique in time series forecasting. The prior informa-
treatment of outliers, and noisy data. Moreover, in cases tion of a technique is used by the fusion component of
wherein the number of customers is large, the problem is the proposed method to assign an appropriate weight to
how to manage the large population of customers, to forecast that technique.
their behavior, and finally construct a representative future
time series reflecting total customer behavior. As cluster- The rest of the paper is structured as follows: Sect. 2
ing is utilized to identify customer segments in the case of firstly gives literature review related to our research and then
non-time series data (Ballestar et al. 2018; Dimitriadis et al. describes concepts and techniques utilized throughout the
2018; Liu et al. 2019; Sarstedt and Mooi 2019), a methodol- study. In Sect. 3, the proposed methodology is portrayed.
ogy is proposed inspired by cluster analysis in the present Section 4 describes the application of the proposed method-
study for segment-level customer behavior forecasting and ology. Section 5 provides discussions and managerial impli-
then it is implemented using banking data. The proposed cations. In Sect. 6, the paper is concluded.
methodology consists of two main components, namely,
clustering and forecasting. The clustering component is used
to perform cluster analysis employing time series cluster- 2 Literature review
ing techniques and the forecasting component utilizes a new
combined method to predict the behavior of each segment. In this section, we give the background of the related studies.
The contribution of this study is twofold, firstly, a new Also, the utilized methods and techniques throughout this
methodology is proposed for segment-level forecasting. study are described.
Secondly, a new combined method is suggested that is
comprised of forecasting techniques from both traditional 2.1 RFM model
and computational intelligence techniques in this context of
forecasting. Time series forecasting is often considered as RFM is a popular model introduced by Hughes (2011) which
a challenging task as each series has its own unique char- has been employed to measure customer lifetime value in
acteristics. Besides, as pointed out in the related literature various area of applications e.g. retail banking (Hosseini
(Khashei and Bijari 2011), there is no single technique able and Shabani 2015; Khajvand and Tarokh 2011), hygienic
to forecast every series accurately. The distinctive contribu- industry (Parvaneh et al. 2012, 2014), retailing (Abirami
tions of this study are thus as follows: and Pattabiraman 2016; Doğan et al. 2018; Hu and Yeh
2014; Serhat et al. 2017; You et al. 2015), telecommunica-
• This study integrates the concept of time series cluster- tion (Akhondzadeh-Noughabi and Albadvi 2015; Song et al.
ing and forecasting into customer behavior analysis. Via 2017), and tourism (Dursun and Caber 2016). The RFM
modeling customer behaviors using the time series con- model comprises three attributes: recency (R), frequency
cept, customers’ dynamic behavioral patterns are also (F), and monetary (M). Due to the significant importance of
captured. The representation of customer behavior as the M attribute from the banking viewpoint, it is forecasted
a time series additionally allows forecasting of future in this study.
behavior. As well, the proposed concept is superior to
previous approaches for customer behavior analysis. 2.2 Involving time in customer behavior analysis
• One of the key contributions of the present study is lev-
eraging time series clustering for detecting similar cus- Past research in the field of customer behavior analysis
tomer groups and forecasting future behavior of each has mainly categorized into static and dynamic customer
13
A new framework for predicting customer behavior in terms of RFM by considering the temporal…
behavior analysis (Akhondzadeh-Noughabi and Albadvi series clustering as they have shown successful results in this
2015). The main drawback of the static approaches is context (Saas et al. 2016).
that they are valid at a specific time point and they miss
significant trends in customer behavior. Therefore, some 2.3.1 Dissimilarity measures
approaches to dynamic customer behavior analysis have
been introduced in the related literature. In Table 1, the dis- To describe the following
( dissimilarity
) criteria,(let us define)
tinguishing characteristics in some studies are outlined. the two time series P = p1 ,p2 , ⋯ pn and Q = q1 ,q2 , ⋯ qn
It is essential to incorporate the time factor when conduct- where n is the number of time-points.
ing a useful dynamic behavior analysis. The best approach to
represent dynamic behavior is modeling it as a time series. • The Euclidean distance
Although previous studies have mainly attempted to con-
sider dynamism using dynamic segmentation, they have dif- The dissimilarity of the two time series P and Q employ-
ficulty in predicting future behavior. In other words, dynamic ing the Euclidean distance is computed following Eq. (1)
segmentation can reveal the trends of past behavior well, (Montero and Vilar 2014):
but it is not sufficient alone since firms need to predict the
future behavior of their customers. To address this problem, ( n )2
∑( )2
a methodology is proposed in the present study to segment dL2 (P,Q) = pt − qt (1)
customers and then to forecast their behavior. t=1
• DTW
2.3 Time series clustering
DTW (Ramos et al. 2015) is a quite popular dissimilarity
A time series is defined as a sequence of observations measure. It is computed based on finding the optimal align-
ordered in time, typically in equal-length time intervals (Le ment between two time series. The optimal path is searched
et al. 2015). For instance, assume that the variable P is meas- employing a dynamic programming approach (Anantasech
ured (over n time points,)then, the time series P is denoted as and Ratanamahatana 2019; Mueen et al. 2018).
P = p1 ,p2 , ⋯ , pn−1 ,pn wherein each pi refers to observation Considering two time series of P and Q , DTW distance
of P in time point i . can be denoted by Eq. (2):
Time series clustering is considered as a special type of (M )
clustering that can be employed for various purposes includ- ∑| |
DTW(P,Q) = min |pim − qjm | (2)
ing discovering hidden patterns from data, exploratory anal- r∈M |
m=1
|
ysis of data, sampling data, and so on (Aghabozorgi
{ et al.
}
2015). Given a set of time series data D = P1 ,P2 , ⋯ ,Pn ,
• CORT
time series { clustering is}the task of dividing D into k par-
titions C = c1 ,c2 , ⋯ ,ck such that similar time series are
CORT measure calculates the distance between two time
assigned to clusters according to a similarity measure. Then,
⋃k series by taking into account both proximity of the values
ci is denoted as a cluster where D = i=1 ci and ci ∩ cj = ∅
and behaviors of series (temporal correlation) (Chouakria
for i ≠ j.
and Nagabhushan 2007; Montero and Vilar 2014). It is
There are also two main decisions in time series cluster-
defined as Eq. (3) (Montero and Vilar 2014):
ing i.e. specifying a suitable distance measure to compute
dissimilarity between two series and choosing a proper ∑n−1 � �� �
clustering algorithm. Various dissimilarity measures have t=1 Pt+1 − Pt Qt+1 − Qt
CORT(P,Q) = �
been so far introduced in the related literature including the ∑n−1 � �2 �∑n−1 � �2
Pt+1 − Qt t=1 Qt+1 − Qt
Euclidean distance, dynamic time warping (DTW), tempo- t=1
13
13
Table 1 Summary of past research on dynamic behavior analysis
Study Contribution Approach
Song et al. (2001) Proposing a methodology that recognizes changes in customer profiles Discovering association rules from customer profile and sale transaction
data at different time points. Then proposing some measure to detect
changes in the discovered rules over time
Böttcher et al. (2009) Proposing a system that tracks how customer behavior changes over Segmentation of customers via frequent itemset mining and tracking
time their temporal development
Lemmens et al. (2012) Proposing a new framework that performs dynamic segmentation A semiparametric hidden Markov model is developed to accommodate
dynamic in countries segmentation
Akhondzadeh-Noughabi and Albadvi (2015) Analyzing customer behavior using sequential rule mining Segmentation of customers in each time step and labeling each cluster
and then using sequential rule mining to discover patterns of customer
shifts
Hosseini and Shabani (2015) Incorporating factor of time in customer segmentation using RFM Customer segmentation using K-means algorithm in each ordered time
period and labeling each segment and then tracking customer behavior
through the time
Saas et al. (2016) Employing time series clustering algorithms for segmenting players in Selecting the best similarity measures along with the agglomerative
the context of the free-to-play games clustering algorithms to discover player behaviors
Song et al. (2017) Proposing a methodology for segmenting customers using RFM model Utilizing K-means algorithm to cluster customers using RFM model
along time intervals along time intervals instead of clustering separately in each time step
Ramon-Gonen and Gelbard (2017) Proposing a cluster evolution analysis that tracks changes in clusters’ The cluster evolution analysis model consists of five sub-phases that
characteristics and discovers migration patterns of objects between detects cluster changes and migration patterns
clusters
H. Abbasimehr, M. Shabani
A new framework for predicting customer behavior in terms of RFM by considering the temporal…
( n )2
∑( )2 𝜙p (B) = 1 − 𝜙1 B − ⋯ − 𝜙p Bp , ΦP (Bm ) = 1 − Φ1 B − ⋯ − ΦP BP
dL2 (P,Q) = pt − qt (4) (9)
t=1
𝜃q (B) = 1 + 𝜃1 B + ⋯ + 𝜃q B , ΘQ (B ) = 1 + Θ1 B + ⋯ + ΘQ BQ
q m
(10)
dCID (P,Q) = CF(P,Q) ⋅ d(P,Q) (5)
m is the seasonality frequency,B is the backward shift
where d(P,Q) corresponds to an existing distance measure, operator,d is the degree of ordinary differencing, D is the
for instance, Euclidean distance and CF is a complexity cor- degree of seasonal differencing, 𝜙p (B) and 𝜃q (B) are the
rection factor calculated by Eq. (7): regular autoregressive and moving average polynomials
of orders p and q , respectively,ΦP (Bm ) and ΘQ (Bm ) are the
max(CE(P),CE(Q))
CF(P,Q) = , (6) seasonal autoregressive and moving average polynomi-
min(CE(P),CE(Q))
als of orders P and Q , respectively and 𝜀t is a zero mean
where CE(P) and CE(Q) are complexity estimator of P andQ, Gaussian
( white noise process
)( with variance 𝜎 ) . In addition,
2
1∑
k
Generally, time series forecasting techniques can be catego- ŷt = y , (11)
rized into two classes including traditional statistically based k i=1 t−i
methods and computational intelligence ones. The statistical
where, yt indicates an actual value, ŷt denotes a forecast
methods fall into two groups of linear and nonlinear mod-
for point t , and k specifies number of time points used in
els. The linear models include moving average, exponential
SMA (Svetunkov 2017; Svetunkov and Petropoulos 2017).
smoothing, and autoregressive integrated moving average
(ARIMA) (Khashei and Bijari 2011). The traditional nonlin-
2.4.3 KNNs
ear methods are also comprised of autoregressive conditional
heteroscedasticity (ARCH), general autoregressive condi-
KNNs are an example of lazy learning used for classification
tional heteroscedasticity (GARCH), and so on. Moreover, the
and prediction (Han et al. 2011). They are based on learning
computational intelligence models such as artificial neural
by comparison, that is, comparing a given test instance with
networks (ANNs), support vector machines (SVMs), k-near-
training ones that are similar to it. KNNs have been also
est neighbors (KNNs) can discover both linear and nonlinear
employed for forecasting NN3 and NN5 competition time
patterns from time series (Panigrahi and Behera 2017). As in
series (Crone et al. 2011). In this line, Martínez et al. (2017)
this study, we have utilized ARIMA, SMA, and KNN, there-
proposed a framework for utilizing KNNs to time series fore-
fore in the following we describe these techniques.
casting. They performed some experiments to identify the
best configuration for preprocessing KNN. After conduct-
2.4.1 ARIMA
ing experiments on NN3 competition data, they concluded
a methodology for employing KNNs to time series forecast-
ARIMA modeling (Box et al. 2015) is a common technique
ing. In another study, Martínez et al. (2018) proposed a new
that has been employed for time series forecasting in many
strategy for handling seasonality in time series forecasting.
domains.
They predicted every different season via a different KNN
The multiplicative seasonal ARIMA model, represented
model. The results of experiments on NN5 data indicated
as ARMIA (p,d,q) × (P,Q,D)m has the following form
that the proposed method could outperform regular KNN
(Brockwell et al. 2002):
model. Furthermore Yu et al. (2016), developed an enhanced
𝜙p (B)ΦP (Bm )(1 − B)d (1 − Bm )D yt = c + 𝜃q (B)ΘQ (Bm )𝜀t KNN model to forecast short-term traffic which was more
(8) appropriate than other models including historical average,
least-squares support vector machine (LS-SVM), Elman NN,
Where
and original KNN. As well, Chen and Hao (2017) proposed
a hybridized method consisting of feature-weighted SVM
13
H. Abbasimehr, M. Shabani
and KNN to forecast stock market indices. Their proposed et al. 2018; Dimitriadis et al. 2018; Liu et al. 2019; Sarstedt
model additionally obtained better prediction accuracy. and Mooi 2019). Representation of customer behavior
To utilize KNN for time series forecasting, training through time series leads to having many time series data.
instances are created from series data. Instance creation is In this study, inspired by cluster analysis (applied on non-
thus performed using a lag. For example, suppose a time time series data), we propose a methodology for segment-
series composed of weekly data of POS amount for a cus- level customer behavior forecasting and implement it using
tomer, employing lags 1–6 as the input feature means that banking data. The proposed methodology consists of two
considering a target time step in the series, its six previous main components, that is, clustering and forecasting. The
observations are regarded as its input features. Table 2 illus- clustering component is used to perform cluster analysis
trates some training instances extracted using lags 1–6 from employing time series clustering techniques and the forecast-
a given series. ing component utilizes a new combined method to forecast
the behavior of each segment. The proposed methodology
is depicted in Fig. 1, schematically illustrates the various
3 Proposed methodology steps of the proposed methodology starting from obtaining
input data and ending at analysis. This research framework
In this study, a methodology is developed for performing is based on the methodologies employed for performing data
segment-level customer behavior forecasting based on time analysis projects.
series analysis techniques. Normally, businesses establish The proposed methodology contains preprocessing, mod-
relationships with many customers and segmentation is con- eling and evaluation, and analysis phases. The preprocessing
ventionally utilized to identify customer segments (Ballestar step focuses mainly on building time series corresponding
to RFM attributes for each customer, and finally, time series
clustering is performed in this phase. The modeling and
Table 2 Instances for customer Input features Target evaluation phase deals with two main decisions including
time series with lags 1–6
choosing appropriate segment-level time series scheme and
1,2,3,4,5,6 7
using a combined method to forecast customer behavior.
2,3,4,5,67 8
Also, the analysis phase gives the results of applying the pro-
… …
posed methodology on a case study. The detailed procedure
8,9,10,11,12,13 14
of the methodology and the full specifications concerning
… …
each phase are described throughout this section.
15,16,17,18,19,20 21
13
A new framework for predicting customer behavior in terms of RFM by considering the temporal…
3.2.4 Removing outliers
3.2.5 Normalizing data
13
H. Abbasimehr, M. Shabani
As stated in Sect. 2, there is not a single forecasting method The combined model firstly computes wij the weight of each
that performs the best when applied to every time series. To forecasteri regarding every time seriesSj . wij is also calcu-
deal with this drawback, a combined method is developed lated based on the performance (competence) of forecasteri
13
A new framework for predicting customer behavior in terms of RFM by considering the temporal…
13
H. Abbasimehr, M. Shabani
• Removing outliers: To reduce the effect of outliers, out- After completing the time series clustering step, a total
lier detection is performed based on standard deviation. of seven clusters are resulted. Each cluster is assumed as a
• Normalizing attributes: In this step, the min–max nor- separate dataset, so the number of datasets utilized in the
malization (Han et al. 2011) is used. forecasting phase is 7.
• Time series clustering: In this step, the agglomerative
hierarchical clustering (Han et al. 2011; Saas et al. 2016) 4.3 Choosing a segment‑level time series
with the Ward’s method (Murtagh and Legendre 2014) representation scheme
is applied using popular distance measures including the
Euclidean distance, CORT, DTW, and CID (Montero and Performing extensive experiments using the two representa-
Vilar 2014). The outcome of this step is customer seg- tion schemes, it is found out that the models resulted from
ments with the same behavior over time. For both guilds employing the aggregate forecast scheme with individual
i.e. Home-Appliance and Retailer, the best time series forecasters fail to perform well with respect to the SMAPE.
clustering based on the silhouette validity index (Cheng Therefore, it is decided to choose the pool of models only
et al. 2019; Desgraupes 2013) is CID with four clusters from those built utilizing the customer-wise scheme to
(Table 3 and Table 4). For the Home-Appliance guild, one develop an accurate forecasting model.
of the clusters has a few numbers of customers; therefore,
it is not considered and eliminated from further analysis. 4.4 Modeling
The population of each customer segment using the CID 4.4.1 Choosing time series forecasting techniques
measure with four clusters is illustrated in Table 5.
In this phase, methods from both traditional time series fore-
casting and computational intelligence methods need to be
Table 3 Clustering results with different number of clusters-home- selected. In this study, the ARIMA is used as it is the most
appliance guild-WARD
widely used traditional method. Although many methods
Similarity measure K=4 K=5 K=6 K=7 K=8 have been thus far proposed in the area of time series predic-
tion, the ARIMA method remains popular as it is flexible and
Euclidean 0.27 0.27 0.27 0.05 0.05
has shown good performance (Kourentzes and Petropoulos
CORT 0.15 0.16 0.17 0.17 0.18
2016; Murray et al. 2018). Also, the SMA is employed as it
DTW 0.32 0.02 0.03 0.03 0.03
has performed well in time series forecasting (Andrawis et al.
CID 0.34 0.34 0.34 0.33 0.23
2011). Among computational intelligence methods, the KNN
The best silhouette index value is indicated in bold method is only used as it has shown good performance in
recent studies (e.g. Martínez et al. 2017, 2018). The primary
reason for using KNN on time series forecasting is that any
Table 4 Clustering results with different number of clusters-retailer time series consists of repetitive patterns, so previous similar
guild-WARD patterns can be found to the current data and exploited to
Similarity measure K=4 K=5 K=6 K=7 K=8 predict future behavior (Martínez et al. 2017). Furthermore,
according to (Martínez et al. 2017), KNN implementation is
Euclidean 0.09 0.09 0.09 0.1 0.09 easier and computationally more efficient. Besides, it is able
CORT 0.14 0.13 0.12 0.09 0.1 to handle seasonal patterns (Martínez et al. 2017). The reason
DTW 0.21 0.21 0.08 0.08 0.05 for not including neural network-based methods is that they
CID 0.32 0.29 0.29 0.19 0.19 require enough data to train a model. However, time series
The best silhouette index value is indicated in bold data are usually short and the amount of information that can
be extracted is very small (Yan 2012).
13
A new framework for predicting customer behavior in terms of RFM by considering the temporal…
According to the combined method, the first phase is to learn • Model building
the weight associated with each forecaster. The three men-
tioned methods including SMA, ARIMA, and, KNN are thus In this step, the selected models from the weight learning
applied. To utilize KNN, we set K = 1, 2, 4, 6 and lag = 1, 4, step are utilized as the pool of models employed in the com-
8. Therefore, there are 14 models and all are inputted to the bined method. The combined method and 5 individual fore-
weight learning step. casters are accordingly applied on 7 datasets. The evaluation
of their performance is provided in the next subsection.
• Weight learning
4.5 Evaluation
To conduct the weight learning step according to the pro-
posed combined method, the training part of each time series The results of comparison of the proposed combined method
is divided into two sections including training data (with with the individual forecasting methods when applied on 7
points 1:32) and validation (with points 33:36). The selected datasets (corresponding to 3 clusters of the Home-Appli-
methods considering performance and diversity are SMA, ance guild and 4 clusters of the Retailer guild respectively)
ARIMA, and KNN with k = 1 and lag-1:4, KNN with k = 1 are illustrated in Table 6. As observed, the average rank
and lag-1:8, and KNN with k = 2 and lag-1:8. These models of the combined method is 1.14, beating all 5 individual
are employed in the model building step. A total of 9 out of forecasters.
14 forecasters are eliminated, as their average ranks are not
high to be included in the pool of models. Since the perfor- 4.5.1 Statistical comparison of predictors
mance of the combined method depends on its forecasters, it
is essential to keep the best performing methods. The weight To evaluate whether the performance of the proposed
wij is thus computed as follows: method is significantly better than that of other individual
forecasters, statistical tests are conducted. In this study, the
1 two-step procedure proposed by (Demšar 2006) is utilized.,
wij = (14)
e2 +𝜀 at first, the Friedman test is carried out to test whether
the individual approaches are equivalent concerning the
SMAPE. If the null hypothesis is rejected (i.e. all forecast-
where wij is weight of forecaster i when applied on time
ers have the same mean ranks), then the post hoc test is
series j , e refers to error in terms of the SMAPE, and 𝜀
performed to determine any significant differences between
denotes a constant to avoid being divided by zero.
13
H. Abbasimehr, M. Shabani
the forecasters. In the following, we describe this two-step Table 7 P-values of post-hoc tests comparing the combined method
methodology. with other individual forecasters
To determine whether there are differences between fore- Fore- SMA ARIMA KNN KNN KNN
casters, in this study, we use Friedman’s test (Demšar 2006). caster (K = 1, (K = 1, (K = 2, lag-
In our research, we have k = 6 forecasting methods that are lag-1:4) lag-1:8) 1:8)
compared in D = 7 datasets (corresponding to the number of p value 0.086474 0.000115 0.010141 0.0027 0 .0027
clusters). As illustrated in the previous subsection, for each
dataset, i ∈ [1, D], the 6 methods, j ∈ [1, k] are employed.
The methods are ranked according to their forecasting per- the combined method with the other methods, including
formance on the datasets. Since SMAPE was selected as the SMA, ARIMA, KNN with k = 1, lag-1:4, KNN with k = 1,
performance measure, the methods are sorted in ascending lag-1:8, KNN with k = 2, lag-1:8. It can be seen that there are
order based on their SMAPE values. In case of ties, the aver- significant differences in the average ranks between the com-
age ranks are allocated. bined method and the other forecasters. The largest p-value
For each method j its average rank over all the datasets is is 0.086474 which is below 𝛼 = 0.1. Therefore, the corre-
∑D
calculated- ARj = D1 i=1 Rij . Where Rij denotes the rank of sponding null hypothesis is rejected. This indicates that there
the method j on dataset i . The null hypothesis of Friedman’s are meaningful differences in the average rank between the
when comparing average ranks ARj is that all methods per- combined methods and the other forecasters. These results
form similarly. Under the null hypothesis, the Friedman sta- are consistent with the previous findings that the ensemble
tistic is given by Eq. (15): of forecasters enhances the individual forecasts (Andrawis
[ k ] et al. 2011).
12D ∑ k(k + 1)2 The significant characteristic of the proposed combined
(15)
2 2
𝜒F = ARj −
k(k + 1) j=1 4 method is employing forecasters from both linear and non-
linear techniques. Both traditional models, e.g., ARIMA
where k is the number of forecasters, D is the number of and computational intelligence models such as KNN, have
datasets, and 𝜒F2 is the chi-squared distribution with k − 1 obtained success in many domains. However, none of them
degrees of freedom. If the value of 𝜒F2 is large enough, then is suitable for all circumstances (i.e., time series with dif-
the null hypothesis is rejected (Demšar 2006). As mentioned ferent characteristics). Therefore, the combined technique
before, we performed Friedman’s test results. Table 6 shows can be able to tackle the limitations of individual forecast-
the p-value of Friedman’s test, which indicates that the null ing techniques and thereby increasing forecasting accuracy.
hypothesis is rejected with significant level 0.05. Therefore, Besides, time series obtained from a real problem may not
there are significant differences among the forecasters. show deterministic characteristics; thus, the proposed com-
In the second step, we applied Hochberg’s post hoc test bined methodology that incorporates traditional and com-
(Demšar 2006) to check any significant differences between putational intelligence methods can be a right choice for
individual forecasters. In fact, this test is used for comparing forecasting real-world time series.
the combined method i and other methods j by computing
the z using Eq. (16): 4.5.2 Time complexity of the combined method
( )
Ri − Rj Time complexity is another metric to assess the performance
z= √ (16)
k(k+1) of the proposed combined method. The ideal situation is
6D to have fast and accurate methods. However, the combined
method utilizes the results of multiple forecasters to improve
where k is the number of forecasters, D is the number
accuracy. We conducted our experiment on a computer with
of datasets. In Hochberg’s test, we firstly computed the p
an Intel(R) Xeon(R) CPU 2.00 GHz, 16 GB of RAM. The
-values of the z for k − 1 comparisons with the control model
mean execution time (in seconds) for the combined method
(the proposed method) and then sorted them in an ascend-
was 273.56. For the KNN, ARIMA, and SMA, the mean
ing order. Afterward, a series of pairwise comparisons is
execution time was 1.66, 140.39, and 1.11, respectively.
performed as follows: firstly, the largest p-value is compared
The execution time of the combined method is the sum of
with the significance level 𝛼 = 0.1; then, the second-largest
execution time for the three single models, as well as the
p-value is compared with 𝛼∕2 and so forth, until a moment
time of weight learning. Therefore, its execution time mainly
where rejection of the null hypothesis is found. All the
depends on the time complexity of its constituent models.
remaining hypotheses are rejected too.
The reason for the high execution time of the ARIMA model
Table 7 illustrates the p-values corresponding to z-value
is that it searches for the best parameters of the ARIMA and
in Eq. (16) computed when comparing the average rank of
executes different model configurations.
13
A new framework for predicting customer behavior in terms of RFM by considering the temporal…
The critical point is that in our case, the primary goal is 4.6.1 Prediction of behavior of home‑appliance guild
to have an accurate method. Also, the results of the proposed
model will be used by marketing planners. In other words, According to the results of the clustering step, the Home-
the model presented will not be used in real-time. Therefore, Appliance guild contains 3 clusters. Figures 5, 6, 7 illustrate
the execution time of the combined method is acceptable. the future behaviors of customers for the next 8 weeks for
Clusters 1, 2, and 3; respectively. The Figures also portray
4.6 Analysis the predicted value of M using a solid line and show the
trend via a dashed line.
Utilizing the proposed methodology, the segment-wise For Cluster 1 which is considered as a high-value cus-
future behavior of customers is obtained. After conducting tomer group, it is observable that the behavioral trend of
the clustering step on the bank customers belonging to the these customers is relatively stable. Besides, Cluster 2 which
Retailer and Home-Appliance guilds, we have obtained the is recognized as a middle-value segment, has a trend similar
behaviorally similar groups of customers for each guild. to Cluster 1, thus its trend is relatively stable. Furthermore,
For the Home-Appliance and Retailer guilds, three and four for Cluster 3, we observe an increasing trend. The bank
clusters are resulted; respectively. Afterwards, the combined should thus devise appropriate marketing programs to turn
forecasting method is applied to the resulted cluster. Besides, stable trends of Cluster 1 and Cluster 2 into an increasing
the performance of the combined forecasting component is trend.
evaluated using Freidman test and Hochberg’s method).
Once the best forecasting model for each cluster is 4.6.2 Prediction of behavior of retailer guild
achieved, the resulted model is employed to predict the
future behavior of each segment. The results of the proposed Similar to the Home-Appliance guild, in this section, the
methodology are distinct from the conventional approaches future behavior of clusters belonging to the Retailer guild
employing a static segmentation. The proposed methodology is predicted using the combined method. Customers of the
can better capture the dynamic behavior of customers via Retailer guild were clustered into 4 clusters.
formulating customer behavior as a time series. The fore- Figures 8, 9, 10,11 illustrate the future behaviors of
casted values of the overall trend of behavior are plotted in customers for the next 8 weeks for Clusters 1, 2, 3, and 4;
Figs. 5, 6, 7, 8, 9, 10, 11. respectively. The Figures also portray the predicted value
0.092
0.087
0.082
0.077
0.072
1 2 3 4 5 6 7 8
Time points
0.04
0.038
0.036
0.034
0.032
0.03
1 2 3 4 5 6 7 8
Time points
13
H. Abbasimehr, M. Shabani
Predicted M
0.015
0.0145
0.014
0.0135
0.013
0.0125
0.012
1 2 3 4 5 6 7 8
Time points
0.046
0.044
0.042
0.04
1 2 3 4 5 6 7 8
Time Points
0.082
0.08
0.078
0.076
0.074
1 2 3 4 5 6 7 8
Time Points
0.23
Predicted M
0.22
0.21
0.2
0.19
1 2 3 4 5 6 7 8
Time Points
13
A new framework for predicting customer behavior in terms of RFM by considering the temporal…
Predicted M
0.0185
0.018
0.0175
0.017
0.0165
0.016
1 2 3 4 5 6 7 8
Time Points
of M using a solid line and represent the trend through a demonstrate that the combined method outperforms the
dashed line. individual forecasting one. The proposed combined method
The forecasted behaviors of Clusters 1–4 of the Home- performs fusion based on the performance of a forecaster on
Appliance guild indicated that the trends of all customer seg- the validation set. Therefore, for a time series with highly
ments are relatively stable. The bank should thus introduce non-stationery and chaotic behavior, this method may lead
more loyalty programs to engage the customers of this group to lower accuracy. It is thus very suitable for time series with
to increase their profitability. repetitive patterns.
The case study is conducted in a business-to-business
(B2B) context as it considers merchants as business custom-
5 Discussion and managerial implications ers of the bank. As pointed out in (Čater and Čater 2010),
establishing a long-term relationship with customers is the
Proposing a hybrid methodology for customer behavior core of B2B marketing. The bank selected as the case study
forecasting is the main contribution of this study. The meth- has many merchants who use POS terminals. In order to
odology firstly formulates customer behavior attributes as manage the growing population of merchants, the bank can
time series and then accomplishes a dynamic segmentation utilize the proposed methodology to identify customer seg-
employing time series clustering. Secondly, it performs time ments and to predict the behavior of each segment. There-
series forecasting to predict the future behavior of customer fore, the bank can formulate segment-specific marketing
segments. Many of the proposed approaches for customer programs to maximize merchant value and to build long-
segmentation are considered as static segmentation since term relationships through accurate forecasting of customer
they conduct clustering in a single point of time. The main behavior. The proposed data-driven methodology can be
drawback of the static segmentation methods is that they used in any domain having customer data.
fail to capture the complex and uncertain nature of customer
behavior (Akhondzadeh-Noughabi and Albadvi 2015).
Therefore, this study contributes to customer relationship 6 Conclusion
literature. As another contribution, this study evaluates the
two schemes for segment-level customer behavior forecast- Forecasting future behavior of customers is one of the core
ing. Based on the results, it is concluded that customer-wise activities of almost every business and industry. By forecast-
approach outperforms aggregate forecast one. ing customer behavior, firms can devise suitable marketing
The proposed methodology contains a time series fore- strategies to improve their relationships with customers. In
casting component. Past literature on time series forecast- this study, a new methodology is proposed for segment-
ing (e.g. (Khashei and Bijari 2011)) has stated that an level customer behavior forecasting based on time series
individual forecasting method may not perform well in all clustering and forecasting. The proposed methodology con-
circumstances. Therefore, in this study, a combined method sists of strategies for performing segment-level time series
is proposed to improve perdition accuracy. This method uti- prediction including aggregate-forecast and customer-wise
lizes prior information about a forecaster’s performance and schemes. Besides, it contains a new combined forecaster
combines the results of multiple forecasting methods. In this benefiting from prior knowledge about the performance of
study, ARIMA, SMA, and KNN are selected to be incor- individual forecasting models (that is, pool of models). The
porated into the combined method which can be applied combined method is developed to address the drawback of
using any forecasting techniques. The results of this study individual forecasting methods as none of them perform
13
H. Abbasimehr, M. Shabani
the best for all circumstances (i.e. time series with differ- Čater T, Čater B (2010) Product and relationship quality influence
ent characteristics). The proposed method is applied to real on customer commitment and loyalty in B2B manufacturing
relationships. Ind Mark Manag 39:1321–1333
banking data of POS customers by incorporating traditional Cen Z, Wang J (2018) Forecasting neural network model with novel
statistical time series including ARIMA and SMA and the CID learning rate and EEMD algorithms on energy market.
computational intelligence method, namely, KNN. The Neurocomputing 317:168–178. https://doi.org/10.1016/j.neuco
results of the evaluation indicate that the combined method m.2018.08.021
Chan CCH, Hwang Y-R, Wu H-C (2016) Marketing segmenta-
has superior performance over individual forecasters. To tion using the particle swarm optimization algorithm: a case
prove that the superiority of the proposed method is signifi- study. J Ambient Intell Hum Comput 7:855–863. https://doi.
cant, statistical tests are also carried out. The results of the org/10.1007/s12652-016-0389-9
statistical tests confirm that the proposed method outper- Chen Y, Hao Y (2017) A feature weighted support vector machine
and K-nearest neighbor algorithm for stock market indices pre-
forms other single methods. For our future work, we will diction. Expert Syst Appl 80:340–355. https://doi.org/10.1016/j.
develop other combination strategies to aggregate the output eswa.2017.02.044
of forecasters. Also, we can incorporate other forecasting Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2019) A novel clus-
techniques suitable for modeling small data. ter validity index based on local cores. IEEE Trans Neural
Netw Learn Syst 30:985–999. https://doi.org/10.1109/TNNLS
.2018.2853710
Acknowledgments The authors would like to thank Aram Bahrini for Chiang W-Y (2018) Applying data mining for online CRM marketing
providing language help during the writing of this paper. strategy: an empirical case of coffee shop industry in Taiwan. Br
Food J 120:665–675
Compliance with ethical standards Chouakria AD, Nagabhushan PN (2007) Adaptive dissimilarity index
for measuring time series proximity. Adv Data Anal Classif
Conflict of interest The authors declare that they have no conflict of 1:5–21
interest. Crone SF, Hibon M, Nikolopoulos K (2011) Advances in forecasting
with neural networks? Empirical evidence from the NN3 competi-
tion on time series prediction. Int J Forecast 27:635–660
Demšar J (2006) Statistical comparisons of classifiers over multiple
data sets. J Mach Learn Res 7:1–30
References Desgraupes B (2013) Clustering indices. Univ Paris Ouest-Lab
Modal’X 1:34
Abirami M, Pattabiraman V (2016) Data mining approach for intelli- Dimitriadis S, Kyrezis N, Chalaris M (2018) A comparison of two
gent customer behavior analysis for a retail store. In: Vijayakumar multivariate analysis methods for segmenting users of alternative
V, Neelanarayanan V (eds) Proceedings of the 3rd international payment means. Int J Bank Market 36:322–335
symposium on big data and cloud computing challenges (ISBCC- Doğan O, Ayçi̇n E, Bulut ZA (2018) Customer segmentation by using
16’), 2016. Springer, Cham, pp 283–291 RFM model and clustering methods: a case study in retail indus-
Aghabozorgi S, Seyed Shirkhorshidi A, Ying Wah T (2015) Time- try. Int J Contemp Econ Admin Sc 8:1–19
series clustering – A decade review. Information Systems 53:16– Duan Y, Cao G, Edwards JS (2018) Understanding the impact of
38. https://doi.org/10.1016/j.is.2015.04.007 business analytics on innovation. Eur J Oper Res. https://doi.
Akhondzadeh-Noughabi E, Albadvi A (2015) Mining the dominant org/10.1016/j.ejor.2018.06.021
patterns of customer shifts between segments by using top-k and Dursun A, Caber M (2016) Using data mining techniques for profil-
distinguishing sequential rules. Manag Decis 53:1976–2003 ing profitable hotel customers: an application of RFM analysis.
Anantasech P, Ratanamahatana CA Enhanced weighted dynamic time Tour Manag Perspect 18:153–160. https://doi.org/10.1016/j.
warping for time series classification. In: Third international tmp.2016.03.001
congress on information and communication technology, 2019. Grover V, Chiang RH, Liang T-P, Zhang D (2018) Creating strategic
Springer, New York, pp 655–664 business value from big data analytics: a research framework. J
Andrawis RR, Atiya AF, El-Shishiny H (2011) Forecast combinations Manag Inf Syst 35:388–423
of computational intelligence and linear models for the NN5 time Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques:
series forecasting competition. Int J Forecast 27:672–688 concepts and techniques. Elsevier, Waltham
Ballestar MT, Grau-Carles P, Sainz J (2018) Customer segmentation in Heldt R, Silveira CS, Luce FB (2019) Predicting customer value
e-commerce: applications to the cashback business model. J Bus per product: From RFM to RFM/P. J Bus Res. https: //doi.
Res 88:407–414. https://doi.org/10.1016/j.jbusres.2017.11.047 org/10.1016/j.jbusres.2019.05.001
Batista GE, Keogh EJ, Tataw OM, De Souza VM (2014) CID: an Hosseini M, Shabani M (2015) New approach to customer segmenta-
efficient complexity-invariant distance for time series. Data Min tion based on changes in customer value. J Market Anal 3:110–
Knowl Disc 28:634–669 121. https://doi.org/10.1057/jma.2015.10
Böttcher M, Spott M, Nauck D, Kruse R (2009) Mining changing cus- Hu Y-H, Huang TC-K, Kao Y-H (2013) Knowledge discovery of
tomer segments in dynamic markets. Expert Syst Appl 36:155– weighted RFM sequential patterns from customer sequence
164. https://doi.org/10.1016/j.eswa.2007.09.006 databases. J Syst Softw 86:779–788. https://doi.org/10.1016/j.
Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series jss.2012.11.016
analysis: forecasting and control. Wiley, New Jersey Hu Y-H, Yeh T-W (2014) Discovering valuable frequent patterns based
Brockwell PJ, Davis RA, Calder MV (2002) Introduction to time on RFM analysis without customer identification information.
series and forecasting. Springer New York. https : //doi. Knowl-Based Syst 61:76–88
org/10.1007/b97391 Hughes A (2011) Strategic database marketing: the masterplan for
starting and managing a profitable, customer-based marketing
program, 4th edn. McGraw-Hill, New York
13
A new framework for predicting customer behavior in terms of RFM by considering the temporal…
Khajvand M, Tarokh MJ (2011) Estimating customer future value of Parvaneh A, Abbasimehr H, Tarokh MJ (2012) Integrating AHP and
different customer segments based on adapted RFM model in data mining for effective retailer segmentation based on retailer
retail banking context. Proc Comput Sci 3:1327–1332 lifetime value. Journal of Optimization in Industrial Engineering
Khashei M, Bijari M (2011) A novel hybridization of artificial neural 5:25–31
networks and ARIMA models for time series forecasting. Appl Parvaneh A, Tarokh M, Abbasimehr H (2014) Combining data min-
Soft Comput 11:2664–2675 ing and group decision making in retailer segmentation based on
Khobzi H, Akhondzadeh-Noughabi E, Minaei-Bidgoli B (2014) A new LRFMP variables. Int J Ind Eng Prod Res 25:197–206
application of RFM clustering for guild segmentation to mine Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method
the pattern of using banks’ e-payment services. J Global Market for dynamic time warping, with applications to clustering. Pattern
27:178–190. https://doi.org/10.1080/08911762.2013.878428 Recognit 44:678–693
Kourentzes N, Petropoulos F (2016) Forecasting with multivariate Ramon-Gonen R, Gelbard R (2017) Cluster evolution analysis: iden-
temporal aggregation: the case of promotional modelling. Int J tification and detection of similar clusters and migration pat-
Prod Econ 181:145–153 terns. Expert Syst Appl 83:363–378. https://doi.org/10.1016/j.
Kumar V, Reinartz W (2018) Customer relationship management: eswa.2017.04.007
concept, strategy, and tools, 3 edn. Springer, Berlin. https://doi. Ramos P, Santos N, Rebelo R (2015) Performance of state space and
org/10.1007/978-3-662-55381-7 ARIMA models for consumer retail sales forecasting. Rob Com-
Kumar V, Shah D (2004) Building and sustaining profitable customer put Integr Manuf 34:151–163
loyalty for the 21st century. J Retail 80:317–329 Saas A, Guitart A, Periáñez Á (2016) Discovering playing patterns:
Le DD, Gross G, Berizzi A (2015) Probabilistic modeling of multisite time series clustering of free-to-play game data. In: 2016 IEEE
wind farm production for scenario-based applications. IEEE Trans conference on computational intelligence and games (CIG), San-
Sustain Energy 6:748–758 torini, Greece, 20–23 Sept 2016. IEEE, New York, pp 1–8. https
Lemmens A, Croux C, Stremersch S (2012) Dynamics in the interna- ://doi.org/10.1109/CIG.2016.7860442
tional market segmentation of new product growth. Int J Res Mark Sarstedt M, Mooi E (2019) Cluster analysis. A concise guide to mar-
29:81–92. https://doi.org/10.1016/j.ijresmar.2011.06.003 ket research. Springer texts in business and economics. Springer,
Lessmann S, Haupt J, Coussement K, Bock KWD (2019) Targeting Berlin, pp 301–354
customers for profit: An ensemble learning framework to support Serhat P, Altan K, Erhan EP (2017) LRFMP model for customer seg-
marketing decision-making. Inf Sci. https://doi.org/10.1016/j. mentation in the grocery retail industry: a case study. Mark Intell
ins.2019.05.027 Plan 35:544–559. https://doi.org/10.1108/MIP-11-2016-0210
Liu J, Liao X, Huang W, Liao X (2019) Market segmentation: a multi- Song HS, Kyeong Kim J, Kim SH (2001) Mining the change of cus-
ple criteria approach combining preference analysis and segmen- tomer behavior in an internet shopping mall. Expert Syst Appl
tation decision. Omega 83:1–13 21:157–168
Martínez F, Frías MP, Pérez-Godoy MD, Rivera AJ (2018) Dealing Song M, Zhao X, Ou EH (2017) Statistics-based CRM approach via
with seasonality by narrowing the training set in time series fore- time series segmenting RFM on large scale data. Knowl-Based
casting with kNN. Expert Syst Appl 103:38–48 Syst 132:21–29. https://doi.org/10.1016/j.knosys.2017.05.027
Martínez F, Frías MP, Pérez MD, Rivera AJ (2017) A methodology Svetunkov I (2017) Statistical models underlying functions
for applying k-nearest neighbor to time series forecasting. Artif of’smooth’package for R. In: Working paper of Department of
Intell Rev 1–19 Management Science, Lancaster University, pp 1–52
Montero P, Vilar JA (2014) Tsclust: an R package for time series clus- Svetunkov I, Petropoulos F (2017) Old dog, new tricks: a modelling
tering. J Stat Softw 62:1–43 view of simple moving averages. Int J Prod Res 56:6034–6047.
Mueen A, Chavoshi N, Abu-El-Rub N, Hamooni H, Minnich A, Mac- https://doi.org/10.1080/00207543.2017.1380326
Carthy J (2018) Speeding up dynamic time warping distance for Yan W (2012) Toward automatic time-series forecasting using neural
sparse time series data. Knowl Inf Syst 54:237–263 networks. IEEE Trans Neural Netw Learn Syst 23:1028–1039.
Murray PW, Agard B, Barajas MA (2018) Forecast of individual cus- https://doi.org/10.1109/TNNLS.2012.2198074
tomer’s demand from a large and noisy dataset. Comput Ind Eng Yildirim P, Birant D, Alpyildiz T (2018) Data mining and machine
118:33–43 learning in textile industry. Wires Data Min Knowl Discov
Murtagh F, Legendre P (2014) Ward’s hierarchical agglomerative clus- 8:e1228. https://doi.org/10.1002/widm.1228
tering method: which algorithms implement Ward’s criterion? J You Z, Si Y-W, Zhang D, Zeng X, Leung SCH, Li T (2015) A deci-
Classif 31:274–295. https://doi.org/10.1007/s00357-014-9161-z sion-making framework for precision marketing. Expert Syst Appl
Ngai EWT, Xiu L, Chau DCK (2009) Application of data mining tech- 42:3357–3367. https://doi.org/10.1016/j.eswa.2014.12.022
niques in customer relationship management: a literature review Yu B, Song X, Guan F, Yang Z, Yao B (2016) k-Nearest neighbor
and classification. Expert Syst Appl 36:2592–2602. https://doi. model for multiple-time-step prediction of short-term traffic con-
org/10.1016/j.eswa.2008.02.021 dition. J Transp Eng 142:04016018
Panigrahi S, Behera HS (2017) A hybrid ETS–ANN model for time
series forecasting. Eng Appl Artif Intell 66:49–59 Publisher’s Note Springer Nature remains neutral with regard to
Paparrizos J, Gravano L (2017) Fast and accurate time-series jurisdictional claims in published maps and institutional affiliations.
clustering. ACM Trans Database Syst 42:1–49. https: //doi.
org/10.1145/3044711
13