FULLTEXT02
FULLTEXT02
Rana Soudagar
By:
Rana Soudagar
Supervisor
Dr. Keramati
Advisor
Dr. Limayem
II
Keywords: Customer segmentation, RFM model, K-means clustering algorithm, EM clustering
algorithm, Generalized Differential RFM method (GDRFM).
III
Table of Contents
Chapter1: Introduction 1
1.1 Background of the study 2
1.2 Problem definition 3
1.3 Purpose of this study 4
1.4 Research question 5
1.5 Research motivation 5
1.6 Research demarcation 6
1.7 Research outline 6
Chapter2: Literature Review 7
2.1 Review of Customer Segmentation based on RFM method 8
2.2 Review of customer segmentation base on Customer Value Matrix Model 10
2.2.1 Methodology of the Customer Value Matrix 11
2.3Review of Customer Segmentation based on Data Mining 12
2.4 Review of Clustering methods 16
2.4.1 K-means method 17
2.4.2 EM (Expectation Maximization) Clustering Method 18
2.5 Review of Customer segmentation Models based on CLV Review 19
Chapter3: Research Methodology 27
3.1Research Purpose 28
3.2 Research Approaches 29
3.3 Research Strategy 29
3.4 Data mining process 31
3.4.1 Data Collection Method 31
3.4.2 Data Pre- Processing 32
3.4.2.1Data cleaning and Integration 33
3.4.2.2 Data Transformation 33
3.4 Customer Segmentation based on RFM Model 33
3.4.1 Frequency, Monetary and Purchase Change rate (FMC) Model 33
3.4.2 Generalized Differential RFM method (GDRFM) 38
3.5 Data Clustering and Customer Segmentation 39
3.6 Strategy Definition per Segment 39
Chapter4: Results & Analysis 40
4.1 Data preprocessing 41
4.1.1 Data Cleaning 41
4.1.2 Data integration 42
4.1.3 Data Transformation 42
4.1.4 RFM Construction 42
4.2 Customer segmentation 43
4.2.1 Customer Value Matrix Results 43
4.2.2 RFM Method Results 47
4.2.3 FMC Method Results 57
4.2.4 GDRFM Method Results 59
4.3 Chapter summary 62
Chapter5: Strategy Definition 63
5.1 Best Ascending Segment 64
IV
5.2 Best Descending Segment 65
5.3 Best Frequency Descending Segment 66
5.4 Best Monetary Descending Segment 66
5.5 Spenders Segment 66
5.6 Frequent Segment 67
5.7 Uncertain Segment 68
5.8 Chapter Summary 69
Chapter6: Conclusion and further research 71
6.1 Conclusion 72
6.2 Contribution 74
6.3 Limitations 74
6.4. Future Works 75
Reference 76
List of tables
V
Table 5.2 Best Descending segment specifications 65
Table 5.3 Best Frequency Descending segment specifications 66
Table 5.4 Best Monetary Descending segment specifications 66
Table 5.5 Spender segment specifications 67
Table 5.6 Frequent segment specifications 67
Table 5.7 Uncertain segment specifications 68
Table 5.8 characteristics and strategies for all customer segments -continue 69
List of figures:
VI
Chapter1: Introduction
1
1.1Background of the study
Customers are regarded as important strategic resources of an enterprise, and gaining and
retention of customers has become the most critical factor of an enterprise’s success (Lai, 2009).
By gaining an overall understanding of customers and then grouping them into categories,
companies are able to better optimize marketing programs, satisfy customers and increase profits
(Chen Y. and Li, 2009). Hence, for a company facing competitive environment, achieving
efficient customer segmentation for applying high quality recommendation strategies is a key
task. Traditionally, customer segmentation is achieved using statistics-based methods that
compute a set of statistical measures from the customer data and then group customers into some
segments by applying clustering algorithms in the space of these statistics (Jiang T. and Tuzhilin
A., 2006).Customer segmentation is so common in real life. For instance, many business entities
differentiate their customers by members and non-members. Also, many enterprises provide
different service levels for different classes of customers. For example, customers can be divided
into a couple of classes. The customers who pay more expensive shipping fee receive orders
quicker than those who pay less expensive shipping fee (Chen Y. and Li, 2009).
Customer segmentation can effectively lower the marketing costs of a company and help it
achieve more visible and profitable market penetration (Lai, 2009).It allows companies to design
and establish different strategies to maximize the value of customers. (Cao et al, 2010).
The world around us changes continuously. For an internet service provider (ISP),
knowledge about what is changing and how it has been changed is also essential. One of the
most aspects of customer segmentation for an ISP is constructing an efficient strategy of
behaving customers. Furthermore, in today’s world where the market is highly competitive,
customers face with various providers with different market strategies. In such a situation,
managers must be aware of customer behaviors and customer situation in their segment. In such
a market, it is necessary to mine customer data to reach this goal. But the most important key in
the way of success in competitive situation is definition of proper strategies to interact with
different customer groups appropriately. A little number of ISP companies in Iran have been
tried to mine their customers’ information by traditional ways. But there is no complete and
comprehensive research or at least a published report on the application of this valuable and
important issue in Iran.
2
1.2Problem Definition
In IRAN, approximately one third of population is using Internet services. There are so
many companies which provide different internet services with different technologies in Iran and
here we call all of them Internet Service Provider or ISP. Based on elementary survey, these
companies have not the complete information about their customers and as a result there is no
reported study on the application of customer segmentation in ISP companies of IRAN. The lack
of such a study causes none efficiency and many shortages and problems for ISPs. Since they
haven’t a clear view of their customers, they couldn’t adopt proper strategies and actions to gain
competitive advantages in the market. They waste so much of their company resources and profit
because they behave with all of their customers the same. One of the most well known problems
in internet service providers is the fact that many of customers change their service providers
frequently and so the companies have many churn customers. Certainly one of the main reasons
of this phenomenon is the lack of different predefined strategies for different customer groups
and also lack of customer segmentation in ISPs. The goal of this thesis is the application of
customer segmentation in an ISP and providing different strategies in each segment using the
gained results.
ATINET Company is the first ISP in Hamadan province. It works on different internet
service categories such as Dial up, ADSL, wireless, broadband and etc. Now, the company is
faced with the challenge of increasing competitions. There are various reasons behind it. First of
all, according to the high demand for internet, every day services with higher speeds are required
and requirements of users increase exponentially. In this situation, nobody knows about next
year’s technology and service. This fast growth of internet enforces companies to switch to other
services rapidly. In this situation, ISPs face the challenge of constantly evolving market where
customer needs are changing all the time. Also, there are some powerful companies that make
competition tighter for ATINET. So in such a market, the customer segmentation can help
company to find some strategies to win the competition in this situation. Also, ATINET
Company requires improving the customer satisfaction in order to improve the competitiveness
to face these challenges. These goals can be reached by a set of actions that the first action
among them is customer segmentation and definition of strategy for each segment.
3
1.3 Purpose of this study
A deeper understanding of customers has validated the value of focusing on them. It is now
generally accepted that it costs about five times more to gain a new customer than to keep an
existing one, and ten times more to get a dissatisfied customer back (Marcus C., 1998). Studies
across numerous industries have also shown that a five-point increase in customer retention can
increase profits by more than 25 percent (Marcus C., 1998).By looking more closely to these
statistics, it is no wonder that managers are considering marketing as a powerful tool for their
enterprises more and more. It is expected that the overall market for software and services using
data mining technology will grow. By considering this fast growth of data mining technology
and database marketing applications such as customer segmentation are taken into consideration.
According to (Lai, 2009) by analyzing traditional methods of customer segmentation, we
can see that customer segmentation methods based on data mining are more advantageous in the
following regards:
• The results of segmentation based on data mining are decided by the objectivity of the
data, the subjectivity of the people who are processing them are avoided, resulting a more
objective representation of the differences among different populations.
• It represents the categorization features among different customer categories more
comprehensively, which facilitates marketing staff know their customers more thoroughly
and in turn make more targeted and individualized marketing plans.
• The changes of customer behaviors can be tracked more easily by collocating clustering
analysis models and updating the categorization of customers regularly.
In this study customer segmentation process is implemented to segment the customers and define
some strategies for them. In order to reach this goal we need to find customer information and
collect data in database. We need to collect as much data as possible about interactions between
customers and the business, analyze this data to turn it into information and finally learn from it
and take action (Bo¨ ttcher et al, 2009).This process is supported by techniques from data
mining. As one of the most important techniques of data mining, clustering analysis is arisen
method in customer segmentation. It aims to recognize a set of clustering rules and group the
customers into several clusters. (Cao Et al, 2010). Nowadays, clustering analysis in the field of
customer segmentation includes algorithms such as partitioned clustering, density-based
clustering, grid-based clustering, fuzzy clustering and hierarchical clustering (Cao et al, 2010).
4
Based on the above consideration we can see that by analyzing the information obtained
from the segmentation of customer behaviors, a company can provide its customers with
products and services truly needed by them and also it can perform best efforts in order to
maximize its customer retention and profitability. The purpose of this study is to apply customer
segmentation method for an internet service provider in Iran and after that definition of proper
strategies per segment. For doing so, some customer segmentation models which are suitable and
applicable for out test case must be analyzed.
5
Business cannot deploy marketing budget equally across all customer segments. By
focusing marketing resource on the top customer segment they can improve overall revenue and
also increase retention of the best customers. For an internet service provider like ATINET
Company, segmentation can help it to improve company ability in facing with variation of
services and competitors in the market. Customer segmentation will help ATINET Company to
focus on the best actions to generate more profits, minimize downsides, and find and exploit
upsides. This can increase profitability and help ATINET identify strengths and weaknesses in
its overall business strategy.
6
Chapter2: Literature Review
7
Customer segmentation is to provide enterprise a full range management perspective,
enable to have a great chance for enterprises to communicate with customers, and to enhance the
return rate of customers (Gong and Xia, 2009).
Customer segmentation needs a comprehensive understanding of companies’ customers.
Since the enterprises must make more scientific future decision, different methods to describe
customer behavior exist in literatures. Among them, there are various types of applications based
on data mining, RFM method, Customer Value matrix and CLV method.
Many applications of customer segmentation are based on personal customer attributes like
sex, age, education, etc. Among them, there are various types of applications based on data
mining. RFM analysis can be conducted by the use of data Mining methods specially clustering
methods. Application of these data mining and clustering methods will result in exploitation of
more useful information and analysis results. On the other side, customer value matrix is one of
the methods that is so easy to implement and understanding.
The last and well known method among these applications is customer lifetime value
methods which have been studied in many cases and by many enterprises.
It must be noted that there are many other customer segmentation methods in literatures
which have not presented here because their application differ fundamentally from our
application in this study. For example, Online purchasing behavior is one of them that can
segment customers based on their purchasing sequences (Wang H. et al, 2006).
In this chapter we will review the above methods and related published studies briefly.
8
include customer’s consumption interval, frequency and spent money. The RFM model was
illustrated to distinguish important customer by these three variables. These variables are defined
in the literature in the following:
• Recency (R): the latest purchase time.
• Frequency (F): the total number of purchases during a specific period.
• Monetary (M): monetary value spent during one specific period.
R stands for recency indicating the interval between the time when the latest consuming
behavior occurs and the current time. F stands for frequency indicating the frequency of
consuming behavior in a period of time. M stands for monetary indicating consumption amount
of money in a period of time.
A large number of studies have considered RFM method. These previous studies in this
area highlight the importance of RFM variables.
(Aggelis, Y., 2005) studied the RFM scoring of active e-banking users. This paper used
clustering techniques as one of the methods of data mining to organize observed examples into
clusters (groups) based on pyramid model which is shown in figure 2.1. K-means algorithm and
two-step clustering method were selected as clustering algorithms. They provided the results for
bank to identify easily the most important users-customers.
9
In (Sohrabi and Khanlari, 2007), authors estimated customer lifetime value by calculating
RFM variables and then they clustered the Bank's customers and proposed customer retention
strategies for treating an Iranian private bank customers.
10
2.2.1 Methodology of the Customer Value Matrix
The first step is collection of data to create Customer Value Matrix. A customer
identification (ID) number, the date of a purchase and the total amount of the purchase are the
data that must be extracted from enterprise’s database. The customer ID number is used to
associate purchases with the appropriate customer and the total amount of each purchase is used
to calculate the Average Purchase Amount (Marcus, C., 1998).
The next step is segmentation process. In the initial step of this process, the average values
for the Number of Purchases and Average Amount Spent must be calculated. After that, each
customer is allocated to one of the four resulting quadrants, which has been shown in figure2.1.
Table 2.1 show the parameters needed for the segmentation must be calculated.
According to (Marcus, C., 1998), Average Number of Purchases is calculated by taking the
total number of purchases for the customer base and dividing it by the total number of customers
in the customer base. The Average Purchase Amount is derived by taking the total revenue and
dividing it by the total number of purchases (see table2.1).
Comparing each customer’s Average Number of Purchases and Average Purchase Amount
with total average values is the next step of Customer Value Matrix process. Then each customer
will be located to one of four quadrants based on whether customers are above or below the axis
averages.
11
Table2.1 Information table for customer value matrix
Source (Marcus, C., 1998)
Average number of purchase = Total Number of purchases/ Total number of customers
Total Number of purchases
Total number of customers
Average purchase amount = Total sales/ Total number of customers
Total sales
Total number of customers
(Madani, S., 2009) used customer value matrix to apply RFM for the small-business retail
environment. She used three types of data includes, purchasing transaction data for extracting RFM,
customer data and product data. In her study, RFM variables are extracted from purchasing transaction
data to analyze the customer behavior. After segmentation, for describing customer behaviors, association
rules used to build customer behavior patterns and their purchase behavior changes.
12
data information and knowledge (Gong and Xia, 2009). By using data mining technology
enterprises can sort and handle and also analyze a huge amount of sophisticated customer’s data.
Data mining is the process of sorting through large amounts of data and picking out
appropriate information and knowledge by using a series of modern techniques (Xin-a Lai,
2009). Data mining involves the use of sophisticated data analysis tools to discover previously
unknown, valid patterns and relationships in large data sets. These tools can include statistical
models, mathematical algorithms (algorithms that improve their performance automatically
through experience, such as neural networks or decision trees) and machine learning methods.
Consequently, data mining consists of more than collecting and managing data; it also includes
analysis and prediction (Cheng Li, 2008).
Data Mining includes association, sequence or path analysis, classification, clustering, and
future activities.
According to the characteristics of data mining and the requirements of an enterprise,
process model of customer segmentation based on data mining can be presented as shown in
figure2.2 (Lai, 2009).
13
The implementation of data mining system has a complete structure of flow, generally
composed of four main stages: identification of business objectives, data preprocessing, data
mining and modeling process, model evaluation and expression as shows in figure 2.4 (Gong and
Xia, 2009).
Data mining is the main step of the knowledge discovery in database (KDD) process.
As it has been depicted in figure 2.5, the KDD process consists of steps: data selection,
data cleaning, data transformation, pattern searching (data mining), finding presentation, finding
interpretation, and finding evaluation.
14
In the following, some studies which perform segmentation of customers based on data
mining technology are presented.
Lai in (Lai, 2009) stated that the most frequently used customer segmentation technique in
data mining is clustering analysis. Clustering analysis can be used to categorize customers based
on the differentiating features of their address, ages, sexes, incomes, occupations, education
levels, etc. Meanwhile, clustering analysis can generate the different levels of importance
associated with different variables in the classifying process; those data can be used to assist
decision-makers.
Gong and Xia in (Gong and Xia, 2009) studied specific implementation of data mining
processes and technology for customer segmentation in a supermarket. The main aim of this
work was to apply the methods of customer segmentation based on customer purchase behavior
to formulate a model in order that enterprises can profoundly understand the customers and make
more scientific future decision.
Data mining tasks are very distinct and diverse because many patterns exist in a huge
database. Different methods and techniques are needed to find different kinds of patterns
According to (Zaïane, 1999). The data mining functionalities and the variety of knowledge they
discover are: Characterization, Discrimination, Association Analysis, classification, Prediction
and Clustering.
Authors in (Chen et al, 2006) build customer segmentation function model based on data
mining and summarize the advantages of customer segmentation function model based on data
mining in customer relationship management (CRM).This segmentation model firstly segment
customers according to the mapping relationship between customer’s attributes and connection
category and subsequently constructs the mapping relationship between attributes space and
conception space
(Li, 2008) worked on binding data mining technology with customer segmentation theory
in aviation freight.
15
2.4 Review of Clustering Methods
Clustering is similar to classification, but conversely, in clustering, class labels are
unknown and the algorithms work to identify a limited set of categories or clusters not only to describe
the data but also to determine acceptable classes.
Clustering analyzes data objects without consulting a known class label. Clustering can
also facilitate taxonomy formation. Customer Analytics Taxonomy and customer behavior
metrics will be explained in detail in the next chapter.
Clustering methods can be categorized into two different types of algorithms which are
Hierarchical algorithms and\non-hierarchical or Partitional algorithms (Yuanli T. and Liangshan
sh., 2010cited by Sag lam et al, 2006 and Zhongding et al, 2009).
By using previously established clusters, hierarchical algorithms (HC) can find successive
clusters. It starts with a single cluster containing all instances and end when a predefined
terminating criterion is achieved. Density-based clustering algorithms are arranged to predict
arbitrary-shaped clusters in which a cluster is considered as a sphere in which the density of data
objects exceeds a threshold (Yuanli T. and Liangshan sh., 2010).
In hierarchical algorithms, number of clusters is unknown in the beginning, which is a
strong advantage of these algorithms over non-hierarchical methods. On the other hand, once an
instance is assigned to a cluster, the assignment is irrevocable. Therefore, we can say that the
output of hierarchical methods can be used to generate some interpretations over the data set and
may be used as an input for a non-hierarchical method in order to improve the resulting cluster
solution.
Non-hierarchical or\Partitional algorithms (NHC) typically determine all clusters initially,
but they can also be used as divisive algorithms in the hierarchical clustering. In these algorithms
usually, the data is divided into k clusters at once and the NHC algorithm iterates for all possible
movements of data points between the formed clusters until a stop-ping criterion is met. In these
methods, each cluster can be represented by the center of the cluster (K-Means) or by one
instance located in the cluster center (K-Medoids). The NHC algorithms are sensitive to initial
partitions and due to this fact, there exist too many local minima (Sag. lam et al, 2006).
Su-li in (Su-li, 2010) implemented customer segmentation in a commercial bank. He
applied the unascertained clustering to divide the commercial bank customers. Although, the
commercial bank concerns customer life cycle value, this study has been improved on the
16
customer evaluation method. The new method calculates the currency value, non-currency value,
current value and potential value adequately. It considers the customer currency as the mainly
evaluation indicators, and the other indicators as the assistant indicators. Combining the
quantitative evaluation and qualitative evaluation, the customer value has been synthetically
evaluated. The unascertained clustering overcomes the deficiency of C-mean value clustering,
and it has quantitative description to the sample characteristics. By applying unascertained
clustering, the paper divides the commercial banks customer into quality customer, backbone
customer, mass customer and low-class customer.
In the following sections we will review two of the most well-known and popular
clustering methods.
The K-means algorithm calculates cluster centers iteratively as shown in the steps of the K-
means algorithm are given in Figure 2.6
17
STEP1
Select randomly k points (it can be also examples) to be the seeds for the centroids of k clusters.
STEP2
Assign each example to the centroid closest to the example, forming in this way k exclusive clusters of examples.
STEP3
Calculate new centroids of the clusters. For that purpose average all attribute values of the examples belonging to the
same cluster (centroid).
STEP4
Check if the cluster centroids have changed their "coordinates". If yes, start again form the step 2. If not, cluster detection
is finished and all examples have their cluster memberships defined.
This algorithm is proper for large amount of data. By considering the simplicity and the
speed, it can be concluded that it is faster than hierarchical clustering and also in globular
clusters K-Means may produce tighter clusters than hierarchical clustering. But there are some
disadvantages in using this technique. In each run it does not show the same result, since the
final clusters depend on the first random assignment. Another disadvantage of this algorithm is
that comparing quality of the clusters produced is so difficult. It also is not useful and
appropriate for non-globular clusters.
18
and details of applying this method can be found in Wikipedia website and many other resources
and has been omitted from here.
19
(Kim et al, 2006) suggested a new Life Time Value (LTV) model and also segment
customers based on their value. After segmenting customers, they proposed marketing strategies
according to customer segments in their case study which was a wireless telecommunication
company. This study includes three phases. The data of this study consists of 6-month service
data of a wireless communication company in Korea. Phase 1 is data preparation and setting up
marketing strategies. The dataset that has been worked in this study is composed of 200 data
fields and 16,384 records of customers. After preparation step, the customer value has been
evaluated from three viewpoints, current value, potential value and customer loyalty. In phase II,
segment analysis has been performed. Phase III analyzes the characteristics of each segment and
this part presents the procedure of building strategies based on these three customer values. The
method for segmentation analysis is Decision Tree used for mining the characteristics of
customers.
(Hwang et al, 2004) and (Kim et al, 2006) suggested a new Life Time Value (LTV) model.
They segmented customers considering past profit contribution, potential benefit, and defection
probability of a customer for a wireless communication company.
These papers measure the leaving probability for each customer to calculate the churn rate,
using data mining techniques; they take several models (decision tree, neural network, and
logistic regression) and then select an optimal model among them, based on the result of
comparative test.
(Ruiz et al, 2004) studied a segmentation of customers based on their activities. They used
clustering algorithm. The algorithm used in the study is P-median method.
(Henry Chan, 2008) presented a novel approach that combines customer targeting and
customer segmentation for campaign strategies. This investigation identifies customer behavior
using a recency, frequency and monetary (RFM) model and then uses a customer lifetime value
(LTV) model to evaluate proposed segmented customers. For selecting more appropriate
customers for each campaign strategy, this work proposed using generic algorithm (GA). This
paper performed an empirical study of a Nissan automobile retailer to segment over 4000
customers to demonstrate the efficiency of the proposed method. As it has been shown in figure
2.10, this work has been implemented in six phases.
20
Source: (Henry Chan, 2008)
Figure 2.7 the framework
(Haining et al, 2010) established an index system of dynamic customer segmentation based
on customer lifetime value in the China Telecom's database mining. In this paper they introduced
the evaluation indices for the telecom industry. Achieving dynamic customer segmentation and
increasing the objectivity of this index system in describing customer behavior are studied.
21
Table 2.2 shows the brief view of literature that was studied in this thesis.
Title Authors/ Major Case Purpose Methodology Conclusion
year method study
Intelligent Chu Chai RFM Nissan Presenting a -Gathering data and This study suggests
value- Henry model, Autom novel establishing a basic an intelligent model
based Chan LTV model obile approach that customer profile. that uses GA to
customer (2008) and generic retailer combines select customer
segmentati algorithm customer -Building RFM model RFM behavior using
on method (GA) targeting and a LTV evaluation
-Then the LTV model
for customer model. If the
calculates current
campaign segmentation proposed
customer value and
manageme for campaign methodology is
predicts potential
nt strategies applied, high-value
customer value.
customers can be
-Finally, applying GA identified for
to select the optimum campaign programs
of customer and it considers the
segmentation for each correlation between
marketing strategy. customer values and
campaigns.
Therefore, Valuable
customers can be
identified for a
campaign program.
Mining Samira CLV, Kalleh Mining -Data preprocessing The results shows
Changes Madani Association Compa changes different kinds of
in rules, ny happening in -Customer changes include
Customer (2009) Apriori customer segmentation based on added/perished
Purchasin algorithm behaviors of a Customer Value rules, emerging
g company Matrix pattern and
Behavior unexpected changes.
-Using apriori
Also, two measures
algorithm for
of similarity and
recognizing mining
unexpectedness has
pattern of behavior.
been identified.
22
Table 2.2 shows the brief view of literature that was studied in this thesis.
Title Authors/ Major Case Purpose Methodology Conclusion
year method study
Research Cheng Li Data Air Implementing -Data preprocessing This work analysis
on (2008) mining, cargo customer the segmentation
Segmentat Clustering segmentation for -Customer in freight
ion analysis aviation cargo segmentation based on customers and
implement based on data customer value connection with
ation mining. (current value and mining theory can
process of Describing the value-added). help air cargo
air cargo hierarchical business to find
-Forecasting model of
customer design idea and out customers
customer value in the
based on functions of with the real
air cargo industry
Data different levels, value, and analyze
Mining which will have -Definition of their features so as
some reference marketing strategies to maintain them.
value for the
airlines to start
CRM.
23
Table 2.2 shows the brief view of literature that was studied in this thesis.
Title Authors/ Major Case Purpose Methodology Conclusion
year method study
Customer B. Sohrabi K-Means Iranian This paper aims -RFM variables This paper suggested
Lifetime and A. clustering, private at suggesting a calculation a CLV model
Value Khanlari CLV and bank new CLV model considering the RFM
(CLV) (2007) RFM and customer -Building CLV at the same time. It
Measurem segmentation model clusters customers
ent Based considering into segments
-Clustering
on RFM RFM model. It according to their
customers by K-
Model also proposed lifetime value
means algorithm
customer expressed in terms of
retention -Proposing RFM.
strategies after customer retention
segmenting strategies
customer base
24
Table 2.2 shows the brief view of literature that was studied in this thesis.
Title Authors Major Case Purpose Methodology Conclusion
/year method study
Improved J. Zhao K-Means Teleco The aim of this -Finding a set of By comparison with
K-Means et al algorith mmunic paper is data objects that original algorithm in
Cluster (2008) m, ations introducing an reflect the data terms of time of
Algorithm clusterin enterpri improved K- distribution and take iterations and accuracy,
in g ses Means it as the cluster improved K-Means was
Telecomm algorith algorithm and center. more stable and also
unications m designing a more advance. The
Enterprise model of -Performing segmentation results
s telecommunica Clustering obtained can be used as
Customer tions the data basis in
Segmentat enterprises differentiated services
ion customer for customers and have
segmentation. positive significance for
product design and
phone packages
recommendation.
25
Table 2.2 shows the brief view of literature that was studied in this thesis.
Title Authors/ Major Case Purpose Methodology Conclusion
year method study
Joint J. Jonker RFM Dutch Presenting a Determining The results show that
optimizati et al charitab joint segmentation. their model leads to a
on of (2008) le optimization significant
The optimal marketing
customer organiz approach improvement over
policy is determined
segmentati ation addressing two CHAID, a model that
for the given
on and issues: (1) the determines an optimal
segmentation
marketing segmentation of strategy given
policy to customers into In order to find new segmentation. They
maximize homogeneous candidate also see that the best
long-term groups of segmentation, this segmentations
profitabilit customers, (2) paper proposes to proposed by their
y determining the adopt a local search method are almost
optimal policy method. identical. This
towards each indicates that our
segment. Appling proposed method does not
method in a direct converge to various
mailing framework. different local
optima.
A Claudio RFM - The purpose of Data gathering The result shows that
practical Marcus this article is to the Customer Value
yet (1998) introduce a Calculating Average Matrix provides an
meaningfu simple yet Number of Purchases affordable, easy to
l approach powerful and Average implement
to approach to Purchase Amount segmentation
customer customer methodology that
Segmentation by
segmentati segmentation. It delivers substantial
proposed method
on is called the value relative to the
(Customer Value
Customer Value amount of effort
Matrix)
Matrix. involved.
Defining some
strategies and tactics.
26
Chapter3: Research Methodology
Research Purpose
Research Approaches
Research Strategy
Data mining process
Data Collection Method
Data Pre- Processing
Data cleaning
Data Transformation
Customer Segmentation based on RFM Model
Frequency, Monetary and Purchase Change rate (FMC) Model
Generalized Differential RFM method (GDRFM)
Data Clustering and Customer Segmentation
Strategy Definition per Segment
27
3.1 Research Purpose
According to (Zhahang et al, 2006) research purpose is to express what should be achieved
by leading research and how the results of the research can be used. It can be classified by its
purpose as exploratory, descriptive, explanatory and predictive. The aim of the exploratory
research is looking for patterns, ideas or hypotheses in a new light rather than testing or
supporting a hypothesis. Furthermore, exploratory research can be conducted using a literature
search, surveying expert about their experiences, conducting focus groups, and case studies.
In contrast, descriptive research identifies and obtains information on accurate profile of a
person or the characteristics of a particular issue. The descriptive research is often used when a
problem is well structured and there is no intention to investigate cause-effect relationship (Xi
Zhang X. and Tang Y, 2006).
Analytical or explanatory research is to understand phenomena by searching and analyzing
casual relationship between cause and effect. This is a continuation of descriptive research.
Predictive research goes further by predicting the similar condition. The goal of this research is
to generalize from the analysis by forecasting certain event on the basis of hypothesized. Table
3.1 shows the differences among these three aspects of research
The purpose of this thesis is descriptive. The descriptive data will be collected and analyzed.
28
3.2 Research Approach
There are two main research approaches to choose from when conducting a scientific
research: quantitative and qualitative (Madani, S., 2009). The approaches that must be used
depend on characteristics of the gathered information and the data types. Indeed, the most
important difference between two approaches is how data and statistics are used (Wang C. and
Wang Zh., 2006) and also it is related to purpose of study and research questions. Quantitative
research deals in numbers, logic and the objective. It is based on measurement of variables, the
delivery of findings in numerical form and also analysis conducted through the use of diagram
and statistic.
On the other hand, qualitative research focuses on non-numerical data collection or
explanation based on the attributes of the graph, analysis conducted through the use of
conceptualization.
Based on purpose and research questions, the chosen approach for this thesis is the
quantitative approach.
29
Source: (Yin, 2003, p.5)
Table3.2 Different Type of Research purpose
Requires control over Focuses on
Strategy Form of research questions
behavior event contemporary
Experiment How, Why? Yes Yes
Survey Who, what, how many, how much? No Yes
Archival
Who, what, how many, how much? No Yes /No
analysis
History How, Why? No No
Case study How, Why? No Yes
The case study strategy is a common strategy in business research that is usually associated
by quantitative approach.It is based on an in-depth investigation of a single individual, group, or
event. A fundamental difference between case studies and these alternative methods is that the
case study researcher may have less a priori knowledge of what the variables of interest will be
and how they will be measured (Benbasa et al, 1987).
The focus of this study is customer segmentation and the data has been collected from an
Internet service provider database. Therefore, it uses case study as the research strategy. The
characteristics of case studies have been shown at table3.3.
30
hypotheses.
10. Case research is useful in the study of "why" and "how" questions because these deal with operational links to
be traced over time rather than with frequency or incidence.
11. The focus is on contemporary events.
31
Source (Yin, 2003, p.86)
Table 3.4: Six Sources of Evidences: Strengths and Weaknesses
Source of
Strengths Weakness
evidence
+ Stable: can be reviewed repeatedly
+Unobtrusive: Not created as a result of the - Retrievability: Can be low
case - Biased Selectivity: If collection is incomplete
Documentation +Exact: Contains exact names, references, - Reporting bias: Reflects (unknown) bias of
and details of an event author
+Broad Coverage: Long span of time, many - Access: May be deliberately blocked
events, and many setting
Archival +(Same as above for Documentation) - (Same as above for Documentation)
records +Precise and quantitative -Accessibility due to privacy blocked
- Bias due to poorly constructed questions
+Targeted: Focuses directly on case study
-Response bias
topic
Interviews -Inaccuracies due to poor recall
+Insightful: Provides perceived casual
-Reflexivity: Interviewee says what interviewer
inferences.
wants to hear
- Time consuming
-Selectivity: Unless broad coverage
Direct +Reality: Cover events in real life
-Reflexivity: Event may proceed differently
observations +Contextual: Covers context of event
because it is being observed
-Cost: Hours needed by human observers
+(Same as above for direct observations) -(Same as above for direct observations)
Participant
+Insightful into interpersonal behavior and -Bias due to investigator’s manipulation of
Observations
motives events
Physical +Insightful into cultural features -Selectivity
Artifacts +Insightful into technical operations -Availability
Many studies use questionnaires for data collection. The questionaries’ questions were
rarely specified and, when they were, it was in a very general form. Sometimes the researchers
mentioned that they used documents and observations, but they did not provide any more detail
about them (Benbasa et al, 1987).
The data needed to perform customer segmentation in our case study were provided by the
company under study. The customer identification (ID) number, the date of a purchase and the
total amount of the purchase and other related fields came from the accounting program of the
ATINET Company.
32
3.4.2.1Data Cleaning and Integration
Data cleaning is one of the most important phases in the data mining process. Sometimes it
may be time-consuming and frustrating but it is essential for quantitative research. Generally, if
this phase of project doesn’t be considered as substantial as other phases, it shows the weakness
of research. In this stage, errors must be detected, missing values must be filled, bad designed
optional fields or useless attributes must be removed and abnormal or out of bounds or
ambiguous items must be checked.
3.4.2.2Data Transformation
In this step, string variables must be converted into numeral or numeric categorical
variables and some codes must be interpreted or replaced by text. The other tasks in this phase
are data aggregation and data generalization. In this study, total purchase data of a customer in a
period of time must be aggregated for performing consequent processes. In data generalization,
low- level data will be substituted by higher level ones.
One of the shortcomings of available customer segmentation models is that they do not
consider behavioral changes of customers during the period of analysis or at last they do not
33
consider it by a direct and defined separate parameter. Although the recency parameter is one of
the indicators of this behavior, it suffers from transient behavior of customer and also it is only
based on last purchase date of customer. So, considering a new parameter seems to be helpful.
For a company, each customer has different average values of purchase during each season
of year or predetermined periods. These average values change based on purchase behavior of
customer. If these average purchase amounts decrease continuously, it can be concluded that this
customer is on the line of canceling its services or at least falling from beneficial customer
segment to non-beneficial ones. Similar conclusion also can be derived for a customer who has
an increasing average purchase value during the period of analysis. Such a customer can become
a profitable customer for company.
So, these different customers with different reflected behavior must be treated differently.
In order to convert this idea in to a computable parameter, all of the purchase amounts of
customers in each period of analysis are required. Then a parameter named change rate of
purchase amount in each time section can be defined as follow:
𝐶ℎ𝑎𝑛𝑔𝑒𝑅𝑎𝑡𝑒𝑜𝑓𝑃𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝐴𝑚𝑜𝑢𝑛𝑡(𝑘 + 1) =
𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑎𝑚𝑜𝑢𝑛𝑡(𝑘 + 1) − 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑎𝑚𝑜𝑢𝑛𝑡(𝑘)
× 100% 𝑖𝑓𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑎𝑚𝑜𝑢𝑛𝑡(𝑘) ≠ 0
� 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑎𝑚𝑜𝑢𝑛𝑡(𝑘) �
100% 𝑒𝑙𝑠𝑒
(3-1)
in which 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑎𝑚𝑜𝑢𝑛𝑡 (𝑘), indicated the total purchase amount of customer in kth
time snapshot of analysis period.
If the time period of analysis is divided into n+1 time section, there will be n change rate of
purchase amount for whole period. The minimum amount of n is 2, in order to have at least 2
parameters for detecting changes in customer behavior. But for making the final indicator of
change rate independent from transient and timely behaviors of customer, it is better to increase
the number of time sections. Now, there is a sequence of rate changes which can be used to
explore the overall purchase behavior of the customer during the analysis time period. This phase
is so important to assign each sequence of change rate values to a distinct and unique value.
In the simplest approach, if the change rate values of the last two or three time sections
have the same sign (negative or positive), then the average of these values will be used as the
final change rate parameter of customer. Other customers who have the change rates with
34
different signs in each time section are assigned zero final change rate value. So, there will be
customers with positive, negative and zero final change rate values in dataset.
The second approach in computing final change rate parameter is averaging all of the
change rate values of all time sections for each customer.
The third approach is extracting and recognizing change patterns of customers using
intelligent algorithms such as neural networks.
Since the third approach is hard to implement and needs so many considerations in
practice, it is not proper for small and mid-sized companies. The second approach also suffers
from one negative fact that can be better understood by an example. When a customer has two
positive and large change rate values at the beginning of period and after that has four small and
negative change rate values which are not comparable with respect to the two first change rate
values, by averaging all values definitely a positive final change rate parameter will be obtained.
But, this positive value doesn’t really reflect the fact that this customer is at risk of canceling the
company services or at least is not so profitable for company.
So, it seems that the first method is more appropriate for customer segmentation than other
methods. But, there are two other approaches that can mitigate the weakness of mentioned
methods.
The first solution is to compute the slope of purchasing amount line in time axis. For doing
so, application of linear regression is proposed.
The computation of the slope of purchase amount in time is based on a best-fit regression
line plotted through the known x-values (which are time of purchase) and known y-values
(which are purchase amount in each time section).The equation for the intercept of the regression
line, a, is:
𝑎 = 𝑦� − 𝑏𝑥̅ (3-2)
Figure 3.1 shows the concept of slope computation for a sample customer data. In this
graph the purchase amounts of a customer are shown in blue points while red line shows the
35
best-fit regression line for the main sample data. The slope of this line indicates the value change
rate of this customer’s purchase.
M
120
100
80
Sample Data
60
b=tg(α) Approximate line
40
20
0
Time
1 2 3 4 5
The slope of this line says that this customer has a decreasing purchase amount behavior
equal to b.
The next new approach proposed in this project is to compute a new parameter which we
named it discounted purchase amount slope (DPS). In this project we use a definition of slope
that is slightly more complex conceptually. The additional concept that we need is that of
discounting. According to this approach, the purchase amount slope of each customer is
computed by the sum of the discounted slopes of purchase amount in all time sequences. The
formula of the DPS is as follow:
The discount rate determines the present value of past slopes. A slope of purchase amount
in k time steps in the past is worth only𝛾 𝑘 times what it would be worth if it were received
immediately. By defining this parameter, we reinforce the effect of recent purchase behaviors of
customer in computation of total purchase amount slope while mitigating the importance of
previous purchase slopes by inserting a discount factor. For example, for a customer with 4 time
36
segments, if we set discount rate equal to 0.7, the last slope is multiplied by 1, the 3rd one is
multiplied by 0.7, the second one is multiplied by 0.49 and the 1st slope will be multiplied by
0.343. Considering this discount factor in computation of total purchase amount slope, leads to
decrease the effect of primary visited slopes in DPS parameter.
It must be noted that in comparison with customer value matrix, this method results in two
sub-segments in each cluster.
It means that, for example, best segment will be divided into two segments with different
DPS sign. One of these clusters has a positive DPS(discounted purchase change rate or slope)
and the other cluster has a negative DPS. So, company must be careful about those customers of
best segment with negative DPS that are at risk of falling to other segments with less profit.
Armed with this knowledge, the company can quickly communicate with these customers and
attempt to offset this anticipated decline in shopping behavior with targeted offers or incentives.
By defining this variable and specifying different clusters with different DPS signs, these
two groups of customers will be treated differently and targeted plans and strategies can be better
37
adopted and designed based on their purchase behaviors.
It is based on the fact that customer behaviors such as decrease in the purchasing amount,
decrease in the number of purchases, decrease in number of product categories purchased by
customer and also increase in the length of time between shopping can be useful in predicting a
potential decline in retention of customers.
These indicators can be addressed just by computing the change rates of RFM variables. In
another word, not only considering RFM values is necessary for segmenting customers but also
computation of derivatives of recency, frequency and monetary amount of customers with
respect to time can be useful for obtaining better and more adequate results in segmentation.
The process of computing average derivatives of RFM parameters is the same as the
process stated in DPS method.
𝑑𝑅𝑖 𝑑𝐹𝑖 𝑑𝑀
If , and 𝑖 represent the derivatives of R,F and M in the ith time step, then
𝑑𝑡 𝑑𝑡 𝑑𝑡
average of these derivative by considering different discount rate for each parameter can be
calculated by following formulas:
𝑛
𝑑𝑅 𝑑𝑅𝑖
� � = � 𝛾𝑅 𝑛−𝑖 � �
𝑑𝑡 𝑎𝑣𝑔 𝑑𝑡
𝑖=1
𝑛
𝑑𝐹 𝑑𝐹𝑖
� � = � 𝛾𝐹 𝑛−𝑖 � �
𝑑𝑡 𝑎𝑣𝑔 𝑑𝑡
𝑖=1
𝑛
𝑑𝑀 𝑑𝑀𝑖
� � = � 𝛾𝑀 𝑛−𝑖 � �
𝑑𝑡 𝑎𝑣𝑔 𝑑𝑡
𝑖=1
(3-5)
where 𝛾𝑅 , 𝛾𝐹 and 𝛾𝑀 are discount rates of recency, frequency and monetary parameters
38
𝑑𝑅𝑖 𝑑𝐹𝑖 𝑑𝑀𝑖
respectively. , and can be calculated easily just by computing differences of
𝑑𝑡 𝑑𝑡 𝑑𝑡
parameters at two consequent time steps:
𝑑𝑅𝑖 𝑅𝑖+1 − 𝑅𝑖
=
𝑑𝑡 𝑡𝑖+1 − 𝑡𝑖
𝑑𝐹𝑖 𝐹𝑖+1 − 𝐹𝑖
=
𝑑𝑡 𝑡𝑖+1 − 𝑡𝑖
𝑑𝑀𝑖 𝑀𝑖+1 − 𝑀𝑖
=
𝑑𝑡 𝑡𝑖+1 − 𝑡𝑖
(3-6)
It must be noted that,𝛾𝑅 , 𝛾𝐹 and 𝛾𝑀 can have different values based on decision of analyst
and the type of case study.
The advantage of this method with respect to simple RFM method is on the fact that in
GDRFM method, changes in behaviors of customer during the time is considered. Therefore
changes in frequency and recency of purchase for a customer are taken in to account with change
slope of purchase amount simultaneously. So, customers with positive monetary change slope
and positive frequency change slope will be treated differently from customers with negative or
different frequency and monetary change slopes.
The last and most important phase of research is definition of proper and useful strategies
for each customer segments. This must be performed by analyzing and studying the customer
behaviors of each segment adequately. These strategies must be in direction of increasing profits
of company or other goals that specified by company before running customer segmentation.
39
Chapter4: Results & Analysis
Data preprocessing
Data Cleaning
Data integration
Data Transformation
RFM Construction
Customer segmentation
Chapter summary
40
In this chapter, firstly the results of data pre-processing phase of analysis which has been
performed in Microsoft SQL Server 2005 [Ref SQL] are presented. Secondly, based on three
customer segmentation models which are RFM model, customer value matrix and a new method
proposed by author, the desired meaningful attributes of customers have been generated. After
that by using some well known and proper clustering algorithms, the resulted data and customers
have been clustered into different groups. These algorithms consist of K-means and EM method.
The above phases altogether form the customer segmentation process.
This analysis is based on customers’ data of ATINET Company during 8 months, from
October 2010 to May 2011. It must be noted that these information are related only to home and
non-official users of company services. It is because of the fact that the number of major
customers of company who are almost official and governmental organizations and have a great
amount of financial transactions is limited and also these customers have a different behavior in
comparison with other customers.
Customer transaction data and demographic data are gathered to construct a basic customer
profile.
41
4.1.2 Data integration
The database contains two tables: customer demographic information table (which consists
of customer-ID, name, family name, e-mail, telephone number, mobile number, birthday, sex,
education, job and age), and also transaction table (which consists of all transactions of customer
in detail). In order to meet the requirements of data mining, the information of two tables must
be merged to obtain a customer sale table, which has integrity information for data mining.
42
For frequency and monetary, the transaction data was aggregated to calculate the total
number of purchases and total amount spent during this period. The final data that is ready to the
next step has the format as illustrated in table 4.1.
Monetary
A sample of the data set on which data mining methods are applied lies in Table 4.2.
User1 28 12 142
User2 92 4 52
User3 8 16 84
…. … … …
Monetary
Average Monetary
Spender Best
Uncertain Frequency
Frequency
44
Figure4.1segmentation based on customer value matrix
Table 4.4 shows the computed values of customer value matrix in our study and test case.
Each customer’s averages must be compared with total average values. So, each customer
will be allocated exclusively to one of the four segments mentioned above. The output of this
step is a matrix as shown in figure 4.2.
1000
800
600
Average Monetary
400
200
0 F
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Based on the customer value matrix, there are four clusters. When average purchase
45
amount of a customer is less than the total average purchase and also average purchase frequency
of customer is less than total average frequency, it said that this customer belongs to uncertain
segment. Customers belong to this segment are uncertain about using services of company, so
they purchase sometimes and spend a little money. The customers of this segment must be
treated specially, because they maybe exit from our customer list and absorbed with other
companies. On the other side, customers with average values greater than total average values lie
in the best customer segment. These customers are so valuable and profitable for company. Up-
Selling and Cross-Selling are two main actions that must be adopted for treating these customers.
When average purchase amount of a customer is greater than the total average purchase
and average purchase frequency of customer is less than total average frequency, we say this
customer belongs to spender segment. These customers are also valuable for company because
they buy services sometimes but in a large amount or expensive items. Our strategy about these
customers must be in the direction of increasing their frequency. By this strategy, they can
become our most beneficial customers.
At last, when average purchase amount of a customer is less than the total average
purchase and average purchase frequency of customer is greater than total average frequency, we
say this customer belongs to frequent segment. These customers buy cheap items but frequently.
Company must introduce them new products or services to increase their average purchase
amount.
The percentages of customers who belong to these segments in our research are shown in
table 4.5 and figure 4,3 respectively.
46
350
300
250
200
"Number of
customers"
150
Percentages
100
50
0
Uncertain Frequent Spender Best
Figure 4.3.Percentage of customers arranged in each segment based on customer value matrix
47
Source (McGuirk M., 2007)
In the above figure, three main segments are highlighted: Frequent (Best), At Risk and
Slow and Steady.
Frequent or best segment indicates the customers who purchase regularly and spending a
large amount per purchase. These customers are active and most beneficial customers of a
company. The Slow and Steady segment contains active customers who purchase frequently but
with a small amount of purchase each time. At Risk customers are customers who purchase
rarely and with a small amount each time. The recency of these customers is large which shows
that they may be absorbed by other companies if we do not adopt a proper strategy for them. The
other segments which have not been highlighted in figure 4.4 can be illustrated and explored
similarly. Figures 4.5-4.7show the histogram of distribution of RFM values of our dataset in this
project.
48
Figure 4.5 Histogram of recency (R)
49
Figure 4.7 Histogram of Monetary (M)
Table 4.6 and figure 4.8 show the results of clustering based on average RFM values as
mentioned above. Figure 4.9 shows the distribution of customers in RFM plane.
50
180
160
140
120
100
Count
80
Percentage
60
40
20
0
Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8
51
An important consideration must be highlighted here which is the fact that since Recency,
monetary and frequency parameters lie in different ranges and specifically monetary value of
customers are so much greater than other parameter values, in order to obtain better and more
reliable clustering results with some of the algorithms which use distance measures, it is better to
scale all of the values to a similar ranges. We perform it by scaling all of the values in to the
range between 0 and 1 which means that we must divide all of the values to the maximum values
of each parameter. The formula of this normalization is as follow:
𝑋−𝑋𝑚𝑖𝑛
𝑋𝑠 = (4-1)
𝑋𝑚𝑎𝑥 −𝑋𝑚𝑖𝑛
After that, customer classification was performed using the K-means and EM clustering
algorithms in WEKA software [weka]. Weka is a collection of machine learning algorithms for
data mining tasks. The algorithms can either be applied directly to a dataset or called from user
specified Java code. Weka contains tools for data pre-processing, classification, regression,
clustering, association rules, and visualization. It is also well-suited for developing new machine
learning schemes. Weka is open source. [weka]
Application of K-means algorithm results in the 8 clusters. The number of each cluster
members and also the average values of each variable in clusters are shown in table 4.7, 4.8 and
figure 4.10.
Table 4.7percentage of customers arranged in each segment based on k-means algorithm and RFM
52
120
100
80
60 Number of
Customers
40 Percentage
20
0
Cluster0 Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7
Figure 4.10 percentage of customers arranged in each segment based on k-means algorithm and RFM
Table 4.8Attributes of parameters for each segment based on k-means algorithm in RFM method
0 1 2 3 4 5 6 7
To determine which clustering algorithms are good and for certifying the existence of
different customer clusters it is better to run more than one algorithm and then analyze and
compare the results carefully.
As suggested above, the EM clustering algorithm was used in order to compare the results.
This method yielded the eight clusters of. The related values are listed in tables 4.9and 4.10 for
this clustering algorithm.
53
Table 4.9 percentage of customers arranged in each segment based on EM algorithm and RFM
140
Number of
120 Customers
100 Percentage
80
60
40
20
0
Cluster0 Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7
Figure 4.11 percentage of customers who are arranged in each segment based on EM algorithm
54
Table 4.10. Attributes of parameters for each segment based on EM algorithm in RFM method
Cluster 0 1 2 3 4 5 6 7
Attribute (0.07) (0. 2) (0.06) (0.23) (0.16) (0.06) (0.17) (0.06)
Graphs that have been drawn in figures 4.12 to 4.14 show the distribution of customers in
various RFM axes. The points belong to different clusters are shown in different colors.
Figure 4.12 distribution of customers in RM plane and their corresponding clusters using EM algorithm
55
Figure 4.13 distribution of customers in FM plane and their corresponding clusters using EM algorithm
Figure 4.14 distribution of customers in FR plane and their corresponding clusters using EM algorithm
56
4.2.3 FMC Method Results
After the above analysis, the next approach to be implemented is FMC or frequency,
monetary and purchase amount change rate. Based on details of method described in chapter 3,
there are two approaches for computing purchase amount change rate. The first approach was
computation of slope of purchase amount in time using linear regression while the second
approach was computation of new parameter named DPS or discounted purchase amount slope.
In this part, the first approach is used. The purchase amounts of each customer during the 8
months were divided in to 4 parts based on definition of 2 months for time step. So, the slope of
best fitted line in the time-monetary plane was computed for each customer. Then frequency,
monetary and change rate parameter were prepared for segmentation. For simplicity, we used a
method which is similar to customer value matrix approach. The total average values of F and M
parameter were calculated for whole customers in order to distinguish customers with values
greater or smaller than average values. The purchase amount change rate values were classified
based on positive or negative sign of purchase amount slope. The resulted segments will be eight
segments which described in chapter 3.
The number of each cluster members and also the average values of each variable in
clusters are shown in table 4.11-4.12 and figure 4.15.
Table 4.11 Percentage of customers arranged in each segment based on FMC method and EM algorithm
57
Table 4.12 Attributes of parameters for each segment based on FMC method and EM algorithm
Cluster 0 1 2 3 4
Attribute (0.01) (0.28) (0.25) (0.4) (0.07)
20.2395 4.1017 10.1041 6.5021 13.1216
Mean
F 2.8634 1.5688 2.8841 1.9783 4.6396
Std. dev.
944.4523 23.2833 122.5965 57.3578 292.616
Mean
6
M 191.5395 8.8141 40.2226 18.4839 124.183
Std. dev.
8
Slope Mean
65.7687 -1.6007 -0.1234 -1.9607 2.4561
250
200
150
Number of
Customers
100
Percentage
50
0
Cluster0 Cluster1 Cluster2 Cluster3 Cluster4
Figure 4.15Percentage of customers arranged in each segment based on FMC method and EM algorithm
58
4.2.4 GDRFM Method Results
In this section the results of applying GDRFM method for customer segmentation are
presented.
As described before, the values of Recency, frequency and monetary of each customer
during the 8 months were divided in to 4 parts based on definition of 2 months for time step.
After that the average values of all parameter derivatives with respect to time were calculated
based on the following formulas:
𝑛
𝑑𝑅 𝑑𝑅𝑖
� � = � 𝛾𝑅 𝑛−𝑖 � �
𝑑𝑡 𝑎𝑣𝑔 𝑑𝑡
𝑖=1
𝑛
𝑑𝐹 𝑑𝐹𝑖
� � = � 𝛾𝐹 𝑛−𝑖 � �
𝑑𝑡 𝑎𝑣𝑔 𝑑𝑡
𝑖=1
𝑛
𝑑𝑀 𝑑𝑀𝑖
� � = � 𝛾𝑀 𝑛−𝑖 � �
𝑑𝑡 𝑎𝑣𝑔 𝑑𝑡
𝑖=1
𝑑𝑅𝑖 𝑅𝑖+1 − 𝑅𝑖
=
𝑑𝑡 𝑡𝑖+1 − 𝑡𝑖
𝑑𝐹𝑖 𝐹𝑖+1 − 𝐹𝑖
=
𝑑𝑡 𝑡𝑖+1 − 𝑡𝑖
𝑑𝑀𝑖 𝑀𝑖+1 − 𝑀𝑖
=
𝑑𝑡 𝑡𝑖+1 − 𝑡𝑖
At last, a table of frequency of purchase (F), monetary (M) and recency together with their
derivatives for customers of company will be obtained. These data must be fed to clustering
algorithm to find customer segments.
The results of clustering using EM algorithm are shown in tables 4.13-4.14 and figure 4.16.
59
Table 4.13Attributes of parameters for each segment based on GDRFM method and EM algorithm
Cluster 0 1 2 3 4 5 6 7
(0.1) (0.12) (0.09) (0.15) (0.08) (0.18) (0.11) (0.16)
Attribute
Table 4.14 Percentage of customers arranged in each segment based on GDRFM method and EM algorithm
0 53 10
1 13 12
2 46 9
3 87 16
4 46 8
5 97 18
6 55 11
7 88 16
Total 544 100
60
120
100
80
60 Number of
Customers
Percentage
40
20
0
cluster0 Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7
Figure 4.16percentage of customers arranged in each segment based on GDRFM method and EM algorithm
As shown in Table 4.15, cluster 2 is the most beneficial segment because it is superior to the
others in terms of all inputs, R, F, and M. Its average recency value is 18 which is smaller than the
total average recency value (Smaller recency parameter is better than a larger one). It also has the
greatest value of dM (positive slope value in purchase amount), has a large positive dF (positive
slope value in purchase frequency) and approximately zero slope for recency parameter. So, the
customers of this segment are very valuable customers during the period of analysis. The difference
of this segment with respect to “Best” segment described in simple RFM model is the fact that the
customers of this segment have an incremental behavior in their buying and the number of buying.
This useful information derived from the capability of GDRFM method in identifying the change rate
of purchase for all customers. In simple RFM model, the customers of “Best” segment are treated the
same. But in GDRFM, company must design different and proper strategies for each sub-segment of
“Best” segment based on the sign of change rate slope of frequency, recency and monetary.
61
Cluster 6 is the group of customers which are valuable customers with approximately great
purchase amount and frequency. But these customers have negative large value of dF and dM. So,
these customers are at risk of falling from beneficial segment to a non-beneficial segment. On the
other word, this segment is a sub-segment of “Best” segment described in simple RFM model. The
customers of this segment must be treated on the way of transferring to cluster 2.
Cluster 5 is similar to cluster 6 in terms of R, F and M but has a large positive value of dF and
dM. So, this group of customers can be conducted easily to cluster 2.
Cluster 7 is inferior to others in terms of frequency, monetary and recency. It has also negative
large value of dF and dM. So, these customers are at risk of cancelling the services of our company.
These customers has are not so beneficial customers for company. So, the strategies for these
customers must be adopted carefully.
Cluster 0 has the largest negative value of dM and positive value of dR among other clusters
and also has a negative value of dF. So these customers are tending incrementally to cancel their
services from our company.
Cluster 1 has a small value of frequency, monetary and recency. It also has a positive dF, dM
and negative dR. So, these customers can be treated on the way that transferring from non-beneficial
segment to other beneficial segments.
Cluster 3 and 4 can be interpreted similarly.
In this chapter, the results of applying RFM method and its variants for customer
segmentation have been shown. The application of newly proposed method for customer
segmentation for an internet service provider shows that GDRFM and FMC methods can be so
useful not only for segmenting customers with different values of recency, frequency and
monetary but also they can indicate easily which customers are at the high risk of cancelling
services of company or falling from beneficial segment to other non-beneficial segments. In the
next chapter the proper strategies for each segment will be proposed in detail.
62
Chapter5: Strategy Definition
63
Each of the customer segments found in previous chapter is further explored to provide
better understanding and identifying opportunities and risks exist in each segment. After that we
must develop targeted programs and strategies for each segment separately.
The strategies and tactics can be divided into two categories: segment-specific strategies
and cross-segment strategies.
In this chapter of all the segment-specific strategies which are related specifically to each
segment will be defined and explained in detail. The cross-segment strategies which are common
for all customers include tactics such as customer retention, service affinity, special services for
loyal customers and setting membership fee for customers. The cross-segment strategies haven’t
been investigated in this project.
The main segments found are as follow:
Retention of these best customers is critical for company. Furthermore, it is necessary for
company to know why these customers prefer to use services of their company. This knowledge
64
is useful for company in order to adopt proper and related strategies in the direction of making
other customers of company to shift to this segment. On the other side, it is important to perform
all of the efforts for retention of these customers.
The best strategy for Best Ascending Customers segment is to recognize that these
customers are the most important customers of company and most worthy of appreciation and
special treatment. These customers are required to feel appreciated. So, they must not only be
rewarded by preferential discounts, but also they must be treated specially through higher
quality, VIP and special services, frequent and high-appreciation communications, informing
about new products or services in a timely manner, and simplifying or increasing the relations of
these customers with company and other customers who share their interests by holding special
events and sessions.
The best strategies for Best Descending customers segment are frequent and high-
appreciation communications and informing about new products or services in a timely manner.
Recognizing the reason of decrease in purchase via communication with customer is the
most helpful action that can be done for this group of customers. After that proper strategy for
increasing the number of purchase and amount of purchase must be adopted. This can be done by
giving information about all products and services or giving special services to these customers.
65
5.3 Best Frequency Descending Segment
This segment is similar to above two segments in term of R, F and M but has the positive
dM and negative dF. This characteristic indicates that these customers are tending to fall into
Spender Segment. So, not only we must use strategies defined for Best segments, but also we
must follow the strategies specified for Spender segment.
The characteristics of this segment are shown in table 5.3.
66
segment and those who are tending to go to Best segments. So, the retention efforts for those
with negative dM value must be reinforced.
Table 5.5 shows the specification of these sub-segments.
Table 5.5Spender segment specifications
Frequency Recency Monetary dM dF dR
sub-segment 1 Low Low High Positive - Negative
sub-segment 2 Low Low High Negative - Positive
67
5.7 Uncertain Segment
These customers spend very little and rarely. It is so important to investigate about why
these customers do not shop frequently and in large amount.
Customers with negative dF and dMare at risk of leaving company services, so we must
adopt proper strategies for them. One of the actions that we can do is promotion plans and some
incentives or offers in order to get these customers to become more engaged. These offers must
be adequate and profitable for company. If this action only lead to one more visit, it will not
useful for company. So, we must define a set of best and most appropriate offers for distinct
groups of this segment. On the other side, we must consider that offers, special discounts and
promotional plans have some cost for company. So, there must be some trade-off between costs
and incomes of these plans or it is better to optimize our offer plans by using predictive models
and more adequate analysis.
We can define two sub-segments for this group of customers. The first group includes
uncertain customers with large negative value of dM, dF and positive value of dR. So these
customers are tending incrementally to cancel their services from the company. These customers are
not so beneficial customers for company. So, the strategies for these customers must be adopted
carefully considering a trade-off between retention costs and their revenue.
Promotional plans and special discounts are useful for this group of customers.
The next sub-segment consists of customers with positive value of dF or dM. For these
customers, proper strategies can be cross-selling, special discount and shifting to online shopping
channel. These customers must be treated on the way that transferring from non-beneficial segment
to other beneficial segments.
Finally, it must be noted that company can focus its efforts only on those Uncertain
Customers who are new or have a great affinity to a specific type of service or have a positive
value of dM or dF.
68
5.8 Chapter Summary
In this chapter, the detail description and specification of all segments found in our case
study were presented. Based on these specifications, some useful strategies were proposed. Table
5.8 summarizes these characteristics and strategies.
It must be noted that effectiveness of these strategies must be studied by a separate
analysis.
Table 5.8 characteristics and strategies for all customer segments -continue
Segment Sub- Attribut Attribut Strategies
segment e e value
R Low • Recognizing the importance of customer
F High • Communication
M High • VIP and special Services
dF Positive • Preferential discounts
Best Ascending
dM Positive • Informing about new products or services
Segment
dR Negativ • Simplifying or increasing the relations
e • Increasing the relations of these customers with
company and other customers who share their
interests by holding special events and sessions.
R Low • Frequent and high-appreciation
F High communications
M high • Informing about new products or services
Best Descending
dF Low • Recognizing the reason of decrease in purchase
Segment
dM Low • Giving information about all products and
dR - services
• Giving special services to these customers
R Low • Frequent and high-appreciation
F High communications
Best Frequency M High • Informing about new products or services
Descending dF Low • Recognizing the reason of decrease in purchase
Segment dM High • Giving information about all products and
dR - services
• Giving special services to these customers
R Low • Frequent and high-appreciation
Best Monetary F High communications
• Informing about new products or services
Descending M High
• Recognizing the reason of decrease in purchase
Segment dF High • Giving information about all products and
dM Low services
dR - • Giving special services to these customers
69
Table 5.8 characteristics and strategies for all customer segments
Segment Sub- Attribute Attribute Strategies
segment value
Sub- R Low • Communication.
segment 1 F Low • Informing them about new products and
Spenders M High services, capabilities and unique aspects of
Segment dF - our company in a timely fashion.
dM High
dR Low
Sub- R Low • Communication.
segment 2 F Low • Informing them about new products and
Spenders M High services, capabilities and unique aspects of
Segment dF - our company in a timely fashion.
dM Low
dR High
Sub- R Low • Bundling,
segment 1 F High • Cross-selling
Frequent M Low • Up-selling.
Segment dF - • Providing online shopping channel
dM High
dR -
Sub- R Low • Bundling,
F High • Cross-selling
segment 2
Frequent • Up-selling.
M Low
Segment • Providing online shopping channel
dF Low
dM -
dR -
70
Chapter6: Conclusion and Further Research
Conclusion
Contributions
Limitations
Future Works
71
6.1 Conclusion
Customer segmentation is a method for grouping customers based upon similarities they
share with respect to any dimension, whether it is customer needs, channel preferences, interest
in certain product features, customer profitability, etc.
Common customer segmentation objectives are developing new products and services,
creating different marketing communications for different customer groups, developing different
customer servicing and retention strategies, targeting company efforts to segments with the
greatest profit potential and developing any strategy that may help the company in increasing its
profits and customer retention.
Customer segmentation and definition of proper strategies for each segment can provide
tremendous returns for companies. In this way, there are various models of implementing
customer segmentation. Some of these methods are RFM, customer value matrix, CLV and data
mining methods. But it must be considered that there is great value to keeping things simple,
especially for small and medium sized businesses. Methods that are derived from complex
statistical modeling techniques can provide useful information for experts but are hard to
implement for these businesses and are likely to present a challenge to the development and
implementation of strategies.
In this study Recency, Frequency and Monetary method which also known as RFM
method has been used for customer segmentation in an Iranian internet service provider.
Customer data and their attitudes were mined in order to perform customer segmentation and
consequently defining proper and useful strategies for having a better view of company
customers and their behaviors and also increasing its profitability. Also company can recognize
and classify an important or less important potential customer to set up proper marketing plan for
those particular customers.
By definition of some new variables in RFM method, two new RFM variant methods have been
proposed which have some advantages with respect to simple RFM model. The results of
applying these new methods show their effectiveness for customer segmentation and also their
ability in identification of customer behaviors especially the risk of cancelling company services.
Customers with different reflected purchase behavior must be treated differently. In order to
convert this idea in to a computable parameter, all of the purchase amounts of customers in each
period of analysis collected and then a parameter named change rate of purchase amount in each
72
time section was defined. The computation of the slope of purchase amount in time can be based
on a best-fit regression line plotted through the known x-values (which are time of purchase) and
known y-values (which are purchase amount in each time section) or a new approach proposed
in this project. This approach is based on computation of a new parameter which we named it
discounted purchase amount slope (DPS). According to this approach, the purchase amount slope
of each customer is computed by the sum of the discounted slopes of purchase amount in all time
sequences. The discount rate determines the present value of past slopes. By defining this
parameter, we reinforce the effect of recent purchase behaviors of customer in computation of
total purchase amount slope while mitigating the importance of previous purchase slopes by
inserting a discount factor.
The next variant of RFM method proposed in this project is based on the idea of value
change rate stated in the above method and we named it Generalized Differential RFM or
GDRFM. If we generalize the computation of purchase amount change rate to R, F and M
parameters we can distinguish at risk customers and customer segments more adequately. It is
based on the fact that customer behaviors such as decrease in the purchasing amount, decrease in
the number of purchases, decrease in number of product categories purchased by customer and
also increase in the length of time between shopping can be useful in predicting a potential
decline in retention of customers. These indicators addressed by computing the change rates of
RFM variables. In another word, not only considering RFM values is necessary for segmenting
customers but also computation of derivatives of recency, frequency and monetary amount of
customers with respect to time can be useful for obtaining better and more adequate results in
segmentation.
The advantage of this method with respect to simple RFM method is on the fact that in
GDRFM method, changes in behaviors of customer during the time is considered. Therefore
changes in frequency and recency of purchase for a customer are taken in to account with change
slope of purchase amount simultaneously. So, customers with positive monetary change slope
and positive frequency change slope will be treated differently from customers with negative or
different frequency and monetary change slopes.
The clustering algorithms used for segmentation of our data were k-means and EM
algorithm. Finally, the detail description and specification of all segments found in our case
study were presented and based on their specifications, some useful strategies were proposed.
73
The results of applying RFM method and its variants (GDRFM and FMC) for customer
segmentation show that GDRFM and FMC methods can be so useful not only for segmenting
customers with different values of recency, frequency and monetary but also they can indicate
easily which customers are at high risk of cancelling services of company or falling from
beneficial segments to other non-beneficial segments.
Since, there are too many methods for customer segmentation, and it is difficult to compare
all of them it can be useful to develop an experiment for comparing the advantages and
disadvantages between existing segmentation methodologies in the future.
The other point is that effectiveness of strategies defined for each segment must be studied
and investigated by a separate analysis. This will guide us and conduct us to have a better
understanding on usefulness or weakness of our proposed methods and strategies.
6.2 Contribution
In this project we proposed two new variants of RFM method which are GDRFM and
FMC method. These methods use new variables that indicate the purchase behavior changes of
customers. We also proposed a novel approach for formulating these changes by proposing a
discount parameter. These discount parameters reinforce the effect of newly visited purchase
change in comparison with old ones.
Customer segmentation using the GDRFM method provides a particularly viable
alternative, simple to implement relative to the amount of effort involved and easy to understand
method for companies. The ability in making difference among customers based on their
behavioral purchase changes and identifying customers who are at risk of cancelling the
company services are the main features of these newly proposed methods.
6.3 Limitations
74
The proposed methods require numerous customer data in order to be validated adequately.
In this study, we had limited and incomplete information about customers, their purchase history
and especially what services or product they had purchased from company.
Because of our incomplete and improper database we couldn’t analyze the other customer
segmentation methods and consequently it was impossible to compare our proposed methods
with other methods.
The future works proposed to be followed after this study, are as follow:
• Comparison of GDRFM method with other customer segmentation methods such as CLTV must
be investigated.
• Checking the effects of changing γ (discount factor) on results of segmentation for monetary,
frequency and recency can be investigated.
• Checking the effects of number of time steps on results and definition of optimum number of time
segments is the other work that can be done in future.
• Comparison and analyzing the effectiveness of GDRFM method for large, medium and small size
businesses must be studied. The advantages and disadvantages of the proposed methods in the
case of different size companies can be analyzed and investigated.
• In the case of strategy definition, the recognition of effectiveness and profitability of
these strategies and optimization of decisions based on their costs and revenue must be
investigated in the future and in a specific study.
• …
75
References
Samira Madani, (2009), Mining Changes in Customer Purchasing Behavior - a Data Mining
Approach. Master Thesis, Lulea University, Department of Business Administration and
Social Sciences Division of Industrial marketing and e-commerce.
Brent A. Gloy Jay T. Akridge Paul V. Preckel, (1997), Customer Lifetime Value: An Application
in the Rural Petroleum Market, Wiley & Sons, Inc. Agribusiness, Volume 13, No. 3, pp.335–
347.
Jinghua Zhao, Wenbo Zhang and Yanwei Liu, (2010), Improved K-Means Cluster Algorithm in
Telecommunications Enterprises Customer Segmentation. Information Theory and
Information Security (ICITIS), IEEE International Conference on, pp.167 – 169
Xiaoping Qin, ShijueZheng, Ying Huang and Guangsheng Deng, (2010), Improved K-Means
algorithm and application in customer segmentation. IEEE International Conference on Web
Information Systems and Mining, pp. 13 – 16
Yuerong Chen and Xueping Li, (2009), The Effect of Customer Segmentation on an Inventory
System in the Presence of Supply Distributions. IEEE International Conference on Winter
Simulation Conference (WSC), pp.2343 – 2352.
Zhang xiao-bin and Gaofeng, Huang hui, (2009), Customer-churn Research Based on
Customer Segmentation. IEEE International Conference on Electronic Commerce and
Business Intelligence, pp.443 – 446.
Burcu Sag ̆lam, F. Sibel Salman, SerpilSayın and MetinTu ̈rkay, (2006), A mixed-integer
programming approach to the clustering problem with an application in customer
segmentation. Elsevier Conference on European Journal of Operational Research 173, pp.
866–879.
76
Zhou Zhongding, Miao Xuemei and Liu Guangcan, (2009), Customer Segmentation
Algorithm of Wireless Content Service Based on Ant K-Means. IEEE Conference on
Izak Benbasat, David K. Goldstein and Melissa Mead (1987) The case research strategy in
studies of information systems. MIS Quarterly, 11(3), pp. 369-386.
Xi Zhang and Yu Tang, (2006), Customer Perceived E-service Quality in Online Shopping.
Master Thesis, Lulea University, Department of Business Administration and Social Sciences
Division of Industrial marketing and e-commerce.
Chun Wang Zheng Wang, (2006), The Impact of Internet on Service Quality in the Banking
Sector. Master Thesis, Lulea University, Department of Business Administration and Social
Sciences Division of Industrial marketing and e-commerce.
Yin, R. K. (2003), Case study Research Design and Methods (3rd ed.) California: Sage
Publications.
77
Vasilis Aggelis and Dimitris Christodoulakis, (2005), Customer Clustering using RFM analysis,
9th WSEAS International Conference on Computers, Special Session Data Mining,
Techniques and Application.
Chu Chai Henry Chan, (2008), Intelligent value-based customer segmentation method for
campaign management: A case study of automobile retailer. Elsevier Conference on Expert
Systems with Applications Volume 34, pp. 2754–2762.
Su-Yeon Kim, Tae-Soo Jung, Eui-Ho Suh and Hyun-Seok Hwang (2006), Customer
segmentation and strategy development based on customer lifetime value: A case study,
Elsevier Conference on Expert Systems with Applications Volume 31, pp. 101–107.
Huaping Gong, Qiong Xia (2009), Study on Application of Customer Segmentation Based on
Data Mining Technology, IEEE Conference on ETP International Conference on Future
Computer and Communication, pp. 167 – 170.
Xin-an Lai (2009), Segmentation Study on Enterprise Customers Based on Data Mining
Technology, IEEE First International Workshop on Database Technology and Applications,
pp. 247 – 250.
Tian Yuanli and shaoliangshan (2010), Customer segmentation based on Ant clustering
Algorithm, IEEE Second Conference On Computational Intelligence and Neural computing
(CINC) Volume 1: pp. 133 – 136.
Babak Sohrabi, Amir Khanlari (2007), Iranian Accounting & Auditing Review, Volume 14 No.
47, pp. 7- 20.
78
Hee Seok Song, Jae kyeong Kim and Soung Hie Kim (2007), Mining the change of customer
behavior in an internet shopping mall, Elsevier Conference on Expert Systems with
Applications Volume 21 pp. 157±168.
Berger, P. D. and Nasr, N. I. (1998), Customer lifetime value: marketing models and
applications, Journal of Interactive Marketing, volume 12(1), pp.17- 30.
Mirko Bo ̈ttcher , Martin Spott , Detlef Nauck and Rudolf Kruse(2009), Mining changing
customer segments in dynamic markets, Elsevier conference on Expert Systems with
Applications Volume 36 pp. 155–164.
Jain, D. and Singh, S. S. (2002), Customer lifetime value research in marketing: a review and
future directions, Journal of Interactive Marketing, Volume 16 , pp. 34- 45.
Jean-Paul Ruiz, Jean-Charles Chebat and Pierre Hansen (2004) Another trip to the mall: a
segmentation study of customers based on their activities, Elsevier Conference on Journal
of Retailing and Consumer Services volume 11 pp. 333–350.
Hwang, Hyunseok, Jung, Taesoo, & Suh, Euiho (2004), An LTV model and customer
segmentation based on customer value: A case study on the wireless telecommunication
industry, Expert Systems with Applications, Volume26, pp.181–188.
Reichheld, F.F. (1996), The Loyalty Effect, Harvard Business School Press.
Massnick, F. (1997), The Customer is CEO: How to Measure what Your Customers Want and
Make Sure They Get It, Amacom.
79
Suqun Cao, Quanyin Zhu and Zhiwei Hou(2009), Customer Segmentation based on a Novel
Hierarchical Clustering Algorithm, IEEE Chinese Conference on Pattern Recognition, pp. 1 –
5.
Zhang xiao-bin and Gaofeng, Huang hui (2009), Customer-churn Research Based on
Customer Segmentation, IEEE International Conference on Electronic Commerce and
Business Intelligence, pp. 443 – 446.
Tan Haining, XuJuanjuan and Zhao Bian (2009), Research on Index System of Dynamic
Customer Segmentation: Based on the Case Study of China Telecom, IEEE International
Conference on Information Management and Engineering, ICIME, pp.441 – 445.
Bezdek, James C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms
Paulo Batista, Mário J. Silva (2004), Mining On-line Newspaper Web Access Logs, IEEE
International Conference on Information Technology: Coding and Computing, Proceedings.
ITCC 2004, Volume: 1 pp.392 - 397
Hill, T. & Lewicki, P. (2007), Statistic Methods and Applications. Stat Soft, Tulsa, OK.
Available on http://www.statsoft.com/textbook/cluster-analysis/#k
Hai Wang, and Shouhong Wang (2006), A Purchasing Sequences Data Mining Method for
Customer Segmentation, IEEE Conference on Service Operations and Logistics, and
Informatics, SOLI '06, pp.883 - 886
Software
Weka
SQL server 2000
80