Behaviourl Segmentation
Behaviourl Segmentation
net/publication/348984515
CITATIONS READS
5 1,459
3 authors, including:
All content following this page was uploaded by Sahar Allegue on 28 May 2021.
Abstract—With regards to an exceptionally competitive fi- It gives information about the purchasing or visiting potential
nancial market impacted by legislative changes and with the of the customer. On the off chance that this interval is short,
evolution of customer’s behavior, banks must provide customer- the probability of repurchase or return is high. Frequency is the
centric assistance, services and products. To understand their
customers, segmentation is a classical technique where banks number of purchases (transactions) or visits within a specific
classify their customers following well-defined banking rules or time frame, and it is an indication of customer loyalty. If the
customer profiles (RFM). In this paper, we propose a novel frequency is high then this means that customer loyalty is also
segmentation, called RFMC that better represents customers’ high. Monetary is the total amount spent or the average amount
behavior by using, not only customers’ profiles but also a spent per visit (or transaction) during a certain specific time
categorization of their spending. We show that, compared to the
classical RFM-based segmentation, RFMC allows for advanced frame and measures the contribution of the customer to the
services that better fulfill customer’s wishes, expectations and revenue of a company. The more noteworthy the sum spent
needs, by leveraging offers that fit their spending behaviors. is, the more the customer contributes to the revenue. Recent
Index Terms—Categorization, Clustering, RFM, Machine research works have attempted to improve the RFM concept by
learning adding extra features or applying data mining techniques [16]
[13] [19] [20]. Although the conventional RFM model per-
I. I NTRODUCTION
forms well in customer segmentation, it generally ignores the
Customer is the most important asset in the banking busi- items purchased and their category which provides significant
ness. For this reason, banks are attempting to make their information about customer spending behavior. More precisely
business and services customer-centric especially in the current RFM model just considers the purchasing potential, that is to
multi-channel banking environment where behavioral, social say, it is conceivable that customers with similar profiles may
and demographic characteristics of customers are changing have completely different spending patterns. Thus, marketers
quickly. This dynamic environment makes an interest in adap- cannot push them offers or services considering them as a
tive customer management that responds to customer needs. homogeneous group.
In this regard, customer segmentation enables banks to divide To make up for the above-mentioned shortcomings, this
customers into distinct and internally homogeneous groups study develops a novel segmentation methodology based on
with similar characteristics and interact with each customer the transaction (purchased item) category. We propose an
segment separately to gain a more profound understanding extended RFM model, namely RFMC (Recency, Frequency,
of their customers’ characteristics and needs [2]. Moreover, Monetary and Category) by adding the category as a new
customer segmentation is a critical success factor for under- segmentation criterion. RFMC can be the basis for building
standing the behavior of different groups of customers and other services such as a customer-centric next best offer and
evaluating their business value. With proper segmentation, en- fraud detection. For example, our spending-based segmen-
terprises can mastermind the right products, services and assets tation allows marketers to deliver the next best offers to
to target interested customers and build a cozy relationship customers in the same segment based on which categories
with them [11]. they spend more and more frequently. Let us consider an
To perform customer segmentation, there are several tech- example where a segment in which most of the members
niques proposed in the literature, and among them, clustering spend 30% of their incomes in the shopping category. This
is the most commonly utilized method. Clustering can be assumption can lead marketers of the bank to push the
based on customer profiles or based on the RFM analysis customers of this segment a shopping credit card. In other
[23] [24]. In recent years, several research works have used contexts, it can help in detecting fraud. For instance, if a
the RFM concept to build customer segmentation models in customer changes his spending habits and uses cash for big
different application areas [15] [12] [18]. The RFM (Recency, bank transactions then this event can be detected as suspicious
Frequency, Monetary) model was first proposed by [37] to an- and the bank gets notified. The RFMC segmentation system
alyze and predict customers’ behavior. Recency is the time in- is based on a categorization service where all transactions are
terval since last purchase or transaction (e.g., days or months). automatically labeled following the transaction information.
This project is carried out under the MOBIDOC scheme, funded by the EU The categorization service provides customers the API to
through the EMORI program and managed by the ANPR modify or add new categories or subcategories. Therefore,
for each customer, we build a dedicated machine learning relation length assumes a fundamental role in customer loyalty,
model that learns the customer categories. The customer- and subsequently LRFM model has become one of the most
specific categories are then unified so that if two customers broadly utilized RFM models by receiving considerable atten-
introduce new terminologies (using different languages or tion in recent literature [13] [33] [40] [41] [8]. Another study
different word spelling or different Synonyms) for the same extended the LRFM to LRFMP where the P factor implies
category, the unification service points to the same category. the periodicity of the customer return. A study in Turkey used
A modular service-oriented architecture is adopted to easily this model by applying the k-means algorithm to analyze the
enhance and extend the available services. The deployment of customers of a grocery store [3]. [14] proposed the RFMTC
RFMC segmentation in a Tunisian bank shows the relevance model by including two variables: time since first purchase (T)
of our work to build enhanced customer-centric next best offer and churn probability (C). In a similar study, [16] has extended
as well as an interesting customers’ 360 view. Note that this the RFM model to the GRFM model by including product
work focuses on the banking domain but RFMC can be applied category group information. GRFM model is used later by
to other transaction-based applications like e-commerce. other works in the literature [20]. In [9], authors considered
The remainder of this paper is organized as follows. The profit as new variable in addition to LRFM variables, while
related works are presented in Section 2. In section 3, we [33] excluded the monetary variable from the original LRFM
present the segmentation system’s architecture, the method- model. [21] introduced another approach for segmenting cus-
ology of the study and the proposed RFMC model. A case tomers with RFM criteria. They changed the definition of RFM
study of a Tunisian bank with the results and the evaluation variables by putting the median value of each variable R, F. In
is provided in Section 4. Finally, we conclude the paper in [17], they proposed a novel RFM framework named R+FM for
Section 5. customer segmentation. They isolated R from the two different
variables (F and M) since Recency shows only the time of the
II. R ELATED WORKS last purchase, but Frequency and Monetary indicate the loyalty
Customers have fluctuating needs, behaviors and prefer- of customers. In another study [10] on the cosmetics industry,
ences, and it is challenging for companies to serve all cus- the RFM model and a modified model that used the additional
tomers similarly well. Customer segmentation emerged in factor of the number of purchased items were compared
response to this issue. It was first introduced by [2].Customer with each other. The outcomes eventually didn’t show any
segmentation is classically adopted to detect the most prof- distinction. Despite their advantages, all these proposed RFM
itable and loyal customers [25] [26] [27] [29] [36]. It provides models overlook what is purchased per segment and fall short
the opportunity to more accurately tailor marketing actions to be used in spending-category based services like next best
and materials to individual customer needs [30] [31] [35]. offers, customer-centric advice and to represent a precise 360
In this regard, a lot of studies [26] [32] [33] have applied view of customers.
customer segmentation to improve customer management and
marketing strategies for various kinds of customer groups. The III. RFMC- BASED SEGMENTATION
segmentation can be based on general variables that incorpo- RFMC is based on the classical RFM features which are
rate customer social-economics (for example sex, age, income, recency (R), frequency (F), monetary (M) and the newly
education level, etc.) and lifestyles(Psycho-graphic). Other- proposed feature category (C). Category refers to the category
wise, it can be based on the product-specific variables that of a transaction made by the customer. This new feature
incorporate customer buying behaviors (for example frequency reflects the spending behavior of customers. This section
of purchase, consumption, spending, etc.) and intentions. In introduces, first, the system architecture and building blocks
this context, RFM models are the most used characteristics for and then the underlying algorithms and models.
customer’s purchase behavior [24] and [34]. Indeed, they are
A. System architecture
efficient in understanding and segmenting customer behavior.
In fact, in recent years, it has been shown that different types The segmentation component is a base building block in a
of RFM models mostly perform well in customer segmentation banking smart budget system. The architecture of this system
in various fields, including healthcare [33], health and beauty is illustrated in figure 1.
[10], outfitter [41], financial and non profit organizations The main architecture components are the following:
[1] [18] [42], hairdressing [32], government agencies [4], • The Pre-processor is responsible for preparing transac-
online industries [5], textile [40], communication industries tion labels by combining some textual features in one
[6], tourism [28] [22], logistics industry [9], the marketing description (we consider Transaction Label, Merchant
industry in particular [7], etc. RFM model has evolved during Name and Merchant Activity). It performs cleaning steps
the previous two decades. Many works have tried to establish to this description such as using regex to remove all non-
new RFM models either by considering additional variables alphabetical symbols, removing all words with less than
or by excluding some of the variables according to the nature a defined number of characters, removing stop words,
of the product or service. For example, [38] set up the LRFM stemming, Lemmatizing, etc.
model by joining another element, customer relation length • The Model Creator initiates the first version (global
(L) into the original RFM model. As stated in [39], customer version) of the classification model in offline mode. It
cleaning for the attributes such as deleting some non-numerical
values and deleting some rows that have missing important val-
ues. Afterward, we categorize all the customer’s transactions
as shown in figure 1. From the list of attributes of customers’
transactions four attributes are selected which are the customer
ID, the amount, the date and the category. RFMC parameters
are then collected, pre-processed and prepared in each trans-
action category for use in the next steps. We conducted this
data transformation using data aggregation which is grouped
by the CustomerID attribute for each category to produce
these variables. Afterward, we perform clustering operations
Fig. 1. Architecture of the banking smart budget system
based on the RFMC model. A well-known and commonly used
clustering technique, namely the K-means algorithm [43], is
tested to segment customers. Before applying the determined
clustering algorithm, finding an optimal number of clusters
is trained by all users and thus all results are based on (k) is a critical issue. For this purpose, various indices are
a large user base. Hence based on a text classification proposed in the literature and they differ in the way they
machine learning algorithm (Naive Bayes) and a training quantify and combine compactness and separation concepts.
data set containing observations and their corresponding In this study, total WSS (within sum of squares) is used as
target categories, a dedicated model is created. Every the cluster validity index and the elbow method is employed
customer has his model that he can update. to determine the optimal number of clusters. The idea behind
• The Budget Classifier applies the customer model to the elbow method is to identify the value of k where the score
attribute a category to each incoming pre-processed begins to decrease most rapidly before the curve reached a
transaction. If the customer accepts this classification, plateau. We evaluate clustering validation by silhouette and
the Budget Classifier stores the categorized data in the Davies-Bouldin indicators [44]. The DB index captures the
database reserved for the categorized data else the new intuition that clusters that are well-spaced from each other
category is passed to the model enhancer. Indeed, the and themselves very dense are likely a good clustering. As
machine learning model behinds is self-corrected each the DB index shrinks, the clustering is considered better. The
time the customer confirms or corrects a classification silhouette score is a measure of the average similarity of the
prediction. objects within a cluster and their distance to the other objects
• The Category unifier processes the user-specific cate- in the other clusters. Then we analyze the status of customers
gories to match them with the predefined bank categories. in each cluster. Finally, according to the results, we provide
In other words, if a customer adds or updates a category guidelines for the identification of valuable customers of the
name, the category unifier finds the relevant category in bank and provide adequate services for them. Note that RFMC
the reference categories to match with. Thus, the user model features vary in range and the scaled differences among
experience is presented homogeneously and category- these attributes influence and distort the results of clustering
based segmentation is possible. For that, we use natural analysis. Standardization solves this problem by reducing the
language understanding and natural language processing potential effects of variable differences. Therefore, before
techniques [45] [46] [47]. clustering, the RFMC variables are standardized by using the
• The Model Enhancer takes the user’s category modi- most widely used scaling technique, simple z standardization,
fications into account and updates the customer model which re-scales each variable to have a mean of 0 and a
accordingly. This update is considered for the following standard deviation of 1.
transactions. In other words, user-specific categories can
be created and updated so that the customer model can be IV. A PPLICATION OF RFMC- BASED SEGMENTATION TO A
dynamically updated as customer spending is evolving. T UNISIAN BANK
A. Dataset: Sample and Data Description
B. Algorithms and models The bank investigated in this study is a local bank that has
In our data analysis and modeling, we adopt CRISP-DM many agencies in Tunisia. The original data set was extracted
data science methodology starting from business understand- from the bank database and it contains almost 3 million
ing up to evaluation. The modeling step is based on an efficient transactions of 397372 customers during the period between
combination of the RFMC model and clustering algorithms. January 1, 2018, and December 31, 2018. Within that period,
This study uses real bank customer data features and assumes several customers have made many transactions. A series of
that all the model variables (R, F, M and C) are of equal data pre-processing tasks including deleting transactions with
weights (i.e., equally important). First, we extract data from the missing values, removing duplicate records, and aggregating
databases of the bank. This data includes customers’ personal some features. Therefore, the final dataset is left with transac-
information and transactional data. Then we do some data tion records of 120222 customers. In the dataset, each transac-
tion record contains the customer’s Account number, Branch The plot of figure 3 indicates that there are bigger de-
code, transaction date, transaction amount, Debit/Credit trans- creases in WSS values until k=6 and subsequent clustering
action, transaction code respectively transaction label, trans- with a higher number of clusters does not show considerable
action currency code and transaction channel. In the case decreases. Therefore, 7 was chosen as an optimum number
of TPE transactions, we have also the merchant category of clusters and we took the results of clustering with k =
code information. Transactions categories and transaction sub- 7. Figure 4 shows all the details for each cluster such as
categories are then predicted by our categorization engine. the sample size and other descriptive statistics of RFMC
Thus a category feature is added to each transaction record. attributes. After performing the K-means algorithm, the value
The features of the RFMC model were generated for each cus-
tomer in each category and the descriptive statistics regarding
the mean values, standard deviations, maximum and minimum
values of these attributes are presented in figure 2.
The predefined list of categories given by the bank is listed
bellow:
’Uncategorized’, ’Fees Charges’, ’Bills and Utilities’, ’Food
Dining’, ’Health Fitness’, ’Shopping’, ’Personal Services’,
’Auto Transport’, ’Travel’, ’Personal Care’, ’Education’.
B. Clustering results
Following the methodology presented in Section 3, the
customers’ transaction data are cleaned, categorized and nor-
malized. As the feature category is categorical (nominal), we
encode it to numerical using one-hot encoding. Then, the k-
means method is applied to cluster the customers. Before
applying k-means the number of clusters k is fixed. Figure
3 shows WSS (within sum of squares) results of the K-means
algorithm against a different number of clusters (k) ranging
from 2 to 14.
Fig. 4. Obtained clusters with RFMC