Pricing and Promotion Strategies of An Online Shop Based On Customer Segmentation and Multiple Objective Decision Making
Pricing and Promotion Strategies of An Online Shop Based On Customer Segmentation and Multiple Objective Decision Making
David L. Olson
a
, Bongsug(Kevin) Chae
b,
a
Department of Management, University of Nebraska, Lincoln, NE 68588-0491, United States
b
Department of Management, Kansas State University, Manhattan, KS 66506, United States
a b s t r a c t a r t i c l e i n f o
Article history:
Received 8 June 2011
Received in revised form 12 May 2012
Accepted 19 June 2012
Available online 3 July 2012
Keywords:
Customer response predictive model
Knowledge-based marketing
RFM
Neural networks
Decision tree models
Logistic regression
Decision support techniques and models for marketing decisions are critical to retail success. Among different
marketing domains, customer segmentation or proling is recognized as an important area in research and
industry practice. Various data mining techniques can be useful for efcient customer segmentation and
targeted marketing. One such technique is the RFM method. Recency, frequency, and monetary methods pro-
vide a simple means to categorize retail customers. We identify two sets of data involving catalog sales and
donor contributions. Variants of RFM-based predictive models are constructed and compared to classical data
mining techniques of logistic regression, decision trees, and neural networks. The spectrum of tradeoffs is an-
alyzed. RFM methods are simpler, but less accurate. The effect of balancing cells, of the value function, and
classical data mining algorithms (decision tree, logistic regression, neural networks) are also applied to the
data. Both balancing expected cell densities and compressing RFM variables into a value function were
found to provide models similar in accuracy to the basic RFM model, with slight improvement obtained by
increasing the cutoff rate for classication. Classical data mining algorithms were found to yield better pre-
diction, as expected, in terms of both prediction accuracy and cumulative gains. Relative tradeoffs among
these data mining algorithms in the context of customer segmentation are presented. Finally we discuss prac-
tical implications based on the empirical results.
2012 Elsevier B.V. All rights reserved.
1. Introduction
The role of decision support techniques and models for marketing
decisions has been important since the inception of decision support
systems (DSSs) [25]. Diverse techniques and models (e.g., optimization,
knowledge-based systems, simulation) have emerged over the last ve
decades. Many marketing domains, including pricing, new product de-
velopment, and advertising, have beneted from these techniques and
models [16]. Among these marketing domains, customer segmentation
or proling is recognized as an important area [18,19,26,43]. There are
at least two reasons for this. First, the marketing paradigm is becoming
customer-centric [41] and targeted marketing and service are suitable.
Second, unsolicited marketing is costly and ineffective (e.g., low re-
sponse rate) [15,30]. Along with these reasons, there are increasing ef-
forts on collecting and analyzing customer data for better marketing
decisions [9,26,30]. The advancement of online shopping technologies
and database systems has accelerated this trend.
Data mining has been a valuable tool in this regard. Various data
mining techniques, including statistical analysis and machine learn-
ing algorithms, can be useful for efcient customer segmentation
and targeted marketing [4,26,38]. One such technique is RFM, stand-
ing for recency, frequency, and monetary. RFM analysis has been used
for marketing decisions for a long time and is recognized as a useful
data mining technique for customer segmentationandresponse models
[3,30]. A survey [43] also shows that RFM is among the most popular
segmentation and predictive modeling techniques used by marketers.
RFMrelies onthree customer behavioral variables (howlong since the
last purchase by customer, howoften the customer purchases, howmuch
the customer has bought) tondvaluable customers or donors anddevel-
op future direct marketing campaigns. Having a reliable and accurate cus-
tomer response model is critical for marketing success since an increase
or decrease in accuracy of 1% could have a signicant impact on their
prots [1]. While there could be many other customer-related factors
[e.g.,42], previous studies have shown that RFM alone can offer a pow-
erful way of predicting the future customer purchase [1,3,17].
Our research builds customer response models using RFM variables
and compares theminterms of customer gains and predictionaccuracy.
The paper aims to increase understanding of how to nd knowledge
hiddenin customer andtransactional databases using data mining tech-
niques. This area is called knowledge-based marketing [26]. The next
section briey reviews various data mining techniques for building cus-
tomer response or predictive models. Section 3 describes methodology.
All the response models will be built upon the three RFM variables,
while different data mining techniques are used. Then, we present a re-
search design, including two direct marketing data sets with over
100,000 observations, a process of predictive modeling building, and
methods to measure the performance of models. Section 4 includes
analysis and results. There could be different methods to increase the
Decision Support Systems 54 (2012) 443451
Corresponding author.
E-mail addresses: [email protected] (D.L. Olson), [email protected]
(B.(K.) Chae).
0167-9236/$ see front matter 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.dss.2012.06.005
Contents lists available at SciVerse ScienceDirect
Decision Support Systems
j our nal homepage: www. el sevi er . com/ l ocat e/ dss
prediction performance of an RFM-based predictive model and sophis-
ticated data mining techniques (decision tree, logistic regression, and
neural networks) appear to outperform more traditional RFM. These
ndings are further discussed in Section 5, comparing results with pre-
vious studies of customer response models and in the broad contexts of
knowledge-based marketing. We also discuss practical implications from
the ndings and offer conclusions.
The contribution of this study is to demonstrate how RFM model
variants can work, and supports general conclusions consistently
reported by others that RFM models are inferior to traditional data
mining models. This study shows that RFM variables are very useful
inputs for designing various customer response models with different
strengths and weaknesses and the ones relying on classical data min-
ing (or predictive modeling) techniques can signicantly improve the
prediction capability in direct marketing decisions. These predictive
models using RFM variables are simple and easy to use in practice
than those with a complex set of variables. Besides descriptive model-
ing techniques popular in practice [43], thus, marketers should adopt
those advanced predictive models in their direct marketing decisions.
2. Customer response models using data mining techniques
2.1. Marketing DSS and customer response models
The use of DSS in marketing goes back to the 1960s and 1970s
[22,44] and has been applied in various areas, including marketing
strategy, pricing, new product development, and product analysis
and management [16]. There has been an increase of DSS use in
customer-side marketing activities, such as customer segmentation
(or proling), direct marketing, database marketing, and targeted ad-
vertising. This reects advances in database management and com-
plex model building [11,16,35]. More convenient methods are
available for the acquisition and storage of large amounts of customer
and transactional data. In addition, knowledge-based systems or intel-
ligent systems using data mining techniques (e.g., neural networks)
[37] have emerged in the marketing domain.
This trend is broadly termed knowledge-based marketing.
Knowledge-based marketing is both data-driven and model-driven:
that is the use of sophisticated data mining tools and methods to
nd knowledge discovery from customer and transactional databases
[26]. Overall, this leads to more efcient and effective communication
with potential buyers and an increase in prots. An important ap-
proach to knowledge-based marketing is to understand customers
and their behavioral patterns. This requires such transactional charac-
teristics as recency of purchases, frequency of purchases, size of pur-
chases, identifying customer groups, and predicting purchases [35].
The RFM model and other data mining-based customer response
models have proven useful to marketers.
2.2. Data mining techniques for customer response models
2.2.1. RFM
R represents the period since the last purchase. F is the number of
purchases made by a customer during a certain period. M is the total
purchase amount by a customer over that period. It is common practice
for each R, F, and Mto have ve groups or levels and thus there are 125
(=5
*
5
*
5) customer segmentation groups. Each customer is segment-
ed into one cell or group. This model allows markets to differentiate
their customers in terms of three factors and to target the customer
groups that are likely to purchase products or services. This technique
is known as the benchmark model in the area of database marketing [3].
Since its introduction in a major marketing journal [5], RFM has
received a great deal of interest from both academic and industry
communities [3,17]. Many studies [1,13,17] have recognized these
three variables as important to predict the future responses by cus-
tomers to potential direct marketing efforts. Certain limitations in
the original RFM model have been recognized in the literature [31,45].
Some previous studies have extended the original RFM model either
by considering additional variables (e.g., socio-demographics) [1] or
by combining with other response techniques [6,7]. Because of the
high correlation between F and M, Yang [45] offered a version of RFM
model collapsing the data to a single variable Value=M/R. To over-
come the problem of data skewed in RFM cells, Olson et al. [31] pro-
posed an approach to balance observations in each of the 125 RFMcells.
Other variables that may be important include customer income,
customer lifestyle, customer age, product variation, and so on [14].
That would make traditional data mining tools such as logistic regres-
sion more attractive. However, RFM is the basis for a continuing
stream of techniques to improve customer segmentation marketing
[12]. RFMhas been found to work relatively well if expected response
rate is high [24]. Other approaches to improve RFM results have in-
cluded Bayesian networks [1,8] and association rules [46].
2.2.2. Classical data mining tools
Common data mining practice in classication is to gather a great
number of variables and apply different standard algorithms. Given
the set of predened classes and a number of attributes, these classi-
cation methods can provide a model to predict the class of other un-
classied data. Mathematical techniques that are often used to
construct classication methods are binary decision trees, neural net-
works, and logistic regression. By using binary decision trees, a tree
induction model with YesNo format can be built to split data into
different classes according to its attributes. Such a model is very
easy to apply to new cases, although the algorithms often produce
an excessive number of rules. Neural networks often t nonlinear re-
lationships very well, but are difcult to apply to new data. Logistic
regression models are easy to apply to new data, although the prob-
lem of a cutoff between classes can be an issue [32].
Relative performance of data mining algorithms has long been un-
derstood to depend upon the specic data. Since data mining soft-
ware is widespread, common practice in classication is to try the
three basic algorithms (decision trees, neural networks, logistic re-
gression), and use the one that works best for the given data set.
Studies have compared these algorithms with RFM. Levin and Zahavi
[20] compared RFM with decision trees (specically CHAID), pointing
out that decision trees are more automatic (RFM requires extensive
data manipulation), but involve modeling issues such as controlling
tree size and determining the best split for branches and leaves.
Kim and Street [19] proposed a neural network model and applied
feature selection mechanisms to reduce input variables, enabling
focus upon the most important variables. Baesens et al. [1] also ap-
plied neural networks to customer response models (adding custom-
er prole indicators to RFM), obtaining better prediction accuracy.
That is a consistent nding data mining algorithms will be expected
to better predict customer response than RFM. However, RFM re-
mains interesting because it relies upon the three fundamentally
basic inputs that are readily available.
3. Methodology
3.1. Problem description and data set
This research design includes two studies (Study 1 and Study 2
hereafter) using two datasets obtained from the Direct Marketing
Educational Foundation. Study 1 uses a dataset including 101,532 in-
dividual purchases from 1982 to 1992 in catalog sales. Study 2 is
based on the data of 1,099,009 individual donors' contributions to a
non-prot organization collected between 1991 and 2006. The pur-
chase orders (or donations) included ordering (or donation) date
and ordering amount. The last four months (AugDec) of the data
were used as the target period: AugDec 1992 for Study 1 and
444 D.L. Olson, B.(K.) Chae / Decision Support Systems 54 (2012) 443451
AugDec 2006 for Study 2. The average response rates in Studies 1
and 2 are 0.096 and 0.062 respectively.
Data preparation and manipulation are an important stage of
knowledge discovery and learning in knowledge-base marketing
[35]. Fig. 1 describes our approach. The raw data contained customer
behavior represented by account, order (or donation) date, order (do-
nation) dollars, and many other variables. We followed the general
coding scheme to compute R, F, and M [17]. Various data preparation
techniques (e.g., ltering, transforming) were used during this pro-
cess. The order date of last purchase (or the date of last donation)
was used to compute R (R1, R2, R3, R4, R5). The data set contained
order (or donation) history and order dollars (or donation amounts)
per each customer (or donor), which were used for F (F1, F2, F3, F4,
F5) and M(M1, M2, M3, M4, M5). We also included one response var-
iable (Yes or No) to the direct marketing promotion or campaign.
3.2. Predictive models
3.2.1. RFM
RFM analysis typically divides the data into 125 cells, designated
by the 5 groups. The most attractive group would be 555, or Group
5 for each of the 3 variables [17].
3.2.2. RFM with balanced cells
Dividing customers or donors into 125 cells tends to result in the
skewness that the data is not evenly distributed among those cells. This
skewness has been recognized as one of the problems with RFM
[13,27,31]. Our approach to this issue was through more equal density
(size-coding) to obtain data entries for all RFM cells. We accomplished
this by adjusting cell limits to obtain more equal counts for cells in the
training set.
3.2.3. RFM with Yang's value function
Previous studies [19] have pointed out a strong correlation between
F and M as a limitation of RFM. The value function [45] compresses the
RFM data into one variable V=M/R.
3.2.4. Logistic regression (LR)
The purpose of logistic regression is to classify cases into the most
likely category. Logistic regression provides a set of parameters for
the intercept (or intercepts in the case of ordinal data with more than
two categories) and independent variables, which can be applied to a
logistic function to estimate the probability of belonging to a specied
output class [32]. Logistic regression is among the most popular data
mining techniques in marketing DSS and response modeling [24].
3.2.5. Decision tree (DT)
Decision trees in the context of data mining refer to the tree struc-
ture of rules. They have been applied by many in the analysis of direct
marketing data [39,40]. The data mining decision tree process in-
volves collecting those variables that the analyst thinks might bear
on the decision at issue, and analyzing these variables for their ability
to predict outcome. Decision trees are useful to gain further insight
into customer behavior, as well as lead to ways to protably act on
results. One of a number of algorithms automatically determines
which variables are most important, based on their ability to sort
the data into the correct output category. The method has relative
advantage over neural network and genetic algorithms in that a
reusable set of rules are provided, thus explaining model
conclusions.
3.2.6. Neural networks (NN)
Neural networks are the third classical data mining tool found in
most commercial data mining software products, and have been ap-
plied to direct marketing applications [4,8,19,36]. NN are known for
their ability to train quickly on sparse data sets. NN separates data
into a specied number of output categories. NN are three layer net-
works wherein the training patterns are presented to the input layer
and the output layer has one neuron for each possible category.
3.3. Performance evaluation measures
There are different methods to assess customer response model
performances. We use prediction accuracy and cumulative gains to
discuss the performance of different predictive customer response
models. Gains show the percentage of responders in each decile. Mar-
keters can gure out how many responders (or what proportion of
responders) can be expected in a specic decile. For example, we
can say that given a same mailing size (e.g., 40% of the total cus-
tomers) a model capturing 70% of the responders is better than a
model capturing only 60% of the responders [47]. Through cumulative
gain values we can evaluate the performances of different data min-
ing techniques [21]. Another way is using prediction accuracy rate
of each technique. The data set employed in this research has the
information about who responded to the direct marketing or cam-
paign. Using R, F, and M as three predictive variables, each data
mining technique will develop a binary customer response model
based on the training data set and apply the model to the test
data set. This will generate prediction accuracy rate the percentage
of customers classied correctly [21]. The model building process is
shown in Fig. 1.
4. Analysis and results
The analysis process consisted of model building using each data
mining technique and model assessment. For Study 1, customer re-
sponse models were developed using RFM, RFM with balanced cells,
RFM with Yang's value function, logistic regression (LR), decision tree
(DT), and neural networks (NN). Model assessment is presented with
gains and predictive accuracy.
4.1. Study 1
An initial correlation analysis was conducted, showing that there
was some correlation among these variables, as shown in Table 1.
Fig. 1. Research design building predictive models using RFM variables.
445 D.L. Olson, B.(K.) Chae / Decision Support Systems 54 (2012) 443451
All three variables were signicant at the 0.01 level. The relation-
ship between R and customer response is negative, as expected. In
contrast, F and M are positively associated with customer response.
R and F are stronger predictors for customer response.
RFM was initially applied, dividing the scales for each of the three
components into ve groups based upon the scales for R, F, and M.
This was accomplished by entering bin limits in SPSS. Table 2 shows
boundaries. Group 5 was assigned the most attractive group, which
for R was the minimum, and for F and M the maximum.
Note the skewness of the data for F, which is often encountered.
Here the smaller values dominate that metric. Table 3 displays the
counts obtained for these 125 cells.
The proportion of responses (future order placed) for the data is
given in Table 4.
In the training set, 10 of 125 possible cells were empty, even with
over 100,000 data points. The cutoff for protability would depend
upon cost of promotion compared to average revenue and rate of
prot. For example, if cost of promotion were $50, average revenue
per order $2000, and average prot rate $0.25 per dollar of revenue,
the protability cutoff would be 0.1. In Table 4, those cells with return
ratios greater than 0.1 are shown in bold. Those cells with ratios at 0.1
or higher with support (number of observations) below 50 are indi-
cated in italics. They are of interest because their high ratio may be spu-
rious. The implication is fairly self-evident seek to apply promotion to
those cases in bold without italics. The idea of dominance can also be
applied. The combinations of predicted success for different training
cell proportions are given in Table 5.
The RFM model from the Excel spreadsheet model yields predictive
model performance shown in the Appendix A for the line Basic on 0.1
(because the cutoff used was a proportion of 0.1) along with results
from the other models. This model was correct (13,961+1337=
15,298) times out of 20,000, for a correct classication rate of 0.765.
The error was highly skewed, dominated by the model predicting
4113 observations to be 0 that turned out to respond. An alternative
model would be degenerate simply predict all observations to be 0.
This would have yielded better performance, with 18,074 correct re-
sponses out of 20,000, for a correct classication rate of 0.904. This
value could be considered a par predictive performance. This data is in-
cluded in the Appendix A, where we will report results of all further
models in terms of correct classication.
Increasing the test cutoff rate leads to improved models. We used
increasing cutoffs of 0.2, 0.3, 0.4, and 0.5, yielding the results indicat-
ed in the Appendix A. Only the model with a cutoff rate of 0.5 resulted
in a better classication rate than the degenerate model. In practice,
the best cutoff rate would be determined by nancial impact analysis,
reecting the costs of both types of errors. Here we simply use classi-
cation accuracy overall, as we have no dollar values to use.
The correlation across F and M (0.631 in Table 1) can be seen in
Table 3, looking at the R=5 categories. In the M=1 column of
Table 3, F entries are 0 for every F5 category, usually increasing
through M=2 through M=5 columns. When F=5, the heaviest
density tends to be in the column where M=5. This skewness is
often recognized as one of the problems with RFM [13,27,31]. Our ap-
proach to this issue was through more equal density (size-coding) to
obtain data entries for all RFM cells. We accomplished this by setting
cell limits by count within the training set for each variable. We
cannot obtain the desired counts for each of the 125 combined cells
because we are dealing with three scales. But we can come closer,
as in Table 6. Difculties arose primarily due to F having integer
values. Table 6 limits were generated sequentially, starting by divid-
ing R into 5 roughly equal groups. Within each group, F was then
sorted into groups based on integer values, and then within those
25 groups, M divided into roughly equally sized groups.
The unevenness of cell densities is due to uneven numbers in the
few integers available for the F category. The proportion of positive
responses in the training set is given in Table 7.
If M=5, this model predicts above average response. There is a
dominance relationship imposed, so that cells 542 and better, 532
and better, 522 and better, 512 and better, 452 and better, 442 and
better, and 433 and better are predicting above average response.
Cells 422, 414, and 353 have above average training response, but
cells with superior R or F ratings have below average response, so
these three cells were dropped from the above average response
model. The prediction accuracy ((13,897+734)/ 20,000) for this
model was 0.732 (see the balance on 0.1 row in the Appendix A). In
this case, balancing cells did not provide added accuracy over the
basic RFM model with unbalanced cells. Using the cutoff rate of 0.5,
the model is equivalent to predict the combination of R=5, F=4 or 5,
and M=4 or 5 as responding and all others not. This model had a cor-
rect classication rate of 0.894, which was inferior to the degenerate
case. For this set of data, balancing cells accomplished better statistical
properties per cell, but was not a better predictor.
Since F is highly correlated with M (0.631 in Table 1), the analysis
is simplied to one dimension. Dividing the training set into groups of
5%, sorted on V, generates Table 8.
Table 1
Variable correlations.
R F M Ordered
R 1
F 0.192
1
M 0.136
0.631
1
Ordered 0.235
0.241
0.150
1
M 0.125
0.340
1
Response 0.266
0.236
0.090