0% found this document useful (0 votes)
3 views9 pages

Final Compare

This document presents research on the comparison of the actual value and score value of the Recency-Frequency-Monetary (RFM) model for clustering in e-commerce applications. The study analyzes a dataset of 273,454 transactions to determine that the actual RFM value model yields better clustering results than the RFM score model, achieving a silhouette score of 0.624646 with three clusters. The findings emphasize the importance of selecting the appropriate RFM model to enhance customer segmentation and improve business strategies.

Uploaded by

mayurj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views9 pages

Final Compare

This document presents research on the comparison of the actual value and score value of the Recency-Frequency-Monetary (RFM) model for clustering in e-commerce applications. The study analyzes a dataset of 273,454 transactions to determine that the actual RFM value model yields better clustering results than the RFM score model, achieving a silhouette score of 0.624646 with three clusters. The findings emphasize the importance of selecting the appropriate RFM model to enhance customer segmentation and improve business strategies.

Uploaded by

mayurj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Accredited Ranking SINTA 2

Decree of the Director General of Higher Education, Research and Technology, No. 158/E/KPT/2021
Validity period from Volume 5 Number 2 of 2021 to Volume 10 Number 1 of 2026

Published online on: http://jurnal.iaii.or.id

JURNAL RESTI
(Rekayasa Sistem dan Teknologi Informasi)
Vol. 7 No. 6 (2023) 1430 - 1438 ISSN Media Electronic: 2580-0760

Comparison of the RFM Model's Actual Value and Score Value for
Clustering
Samidi1, Ronal Yulyanto Suladi2, Dewi Kusumaningsih3
1,2,3Master of Computer Science, Faculty of Information Technology, Universitas Budi Luhur, Jakarta, Indonesia
[email protected], [email protected], [email protected]

Abstract
Clustering algorithms and Recency-Frequency-Monetery (RFM) models are widely implemented in various sectors of e-
commerce, banking, telecommunications, and other industries to obtain customer segmentation. The RFM model will assess a
line of data which includes the recency and frequency of data appearance as well as the monetary value of a transaction made
by a customer. Choosing the right RFM model also influences the analysis of cluster results, the output of cluster results is
more compact for the same clusters (inter-cluster) and separate for other clusters (intra-cluster). Through an experimental
approach, this research aims to find the best dataset transformation model between actual RFM values and RFM scores. The
method used is to compare the actual RFM value model and the RFM score and use the silhouette score value as an indicator
to get the best clustering results using the K-Means algorithm. The subject of this research is a stall-based e-commerce
application, where data was taken in the Wiradesa area, Central Java. The resulting dataset consisted of 273,454 rows with
18 attributes from January 2022 to December 2022 through collecting historical data from shopping outlets to wholesalers.
Analysis of the dataset was carried out by transforming the dataset using the RFM method into actual values and score values,
then the dataset was used to obtain the best cluster data. The results of this research show that transaction data based on time
(time series) can be transformed into data in the RFM model where the RFM model's actual value is better than the RFM score
model with a silhouette score = 0.624646 and the number of clusters (K) =3. The results of the clustering process also form a
series of data with a cluster label, thus forming supervised learning data.
Keywords: RFM model; RFM actual value; RFM core value; clustering

1. Introduction segmentation tries to group customers based on certain


similar characteristics. Grouping or segmenting
The commercial industry has a goal to optimize return
customers using data mining is one of the things that
on investment in several ways, such as through
can provide an advantage for an organization to analyze
acquisitions by influencing and attracting new
customer behavior and other matters related to
customers or by retaining existing customers by
relationships [6].
providing new offers and products to increase the
revenue [1]. According to Pareto, of all customers The RFM model is a behavior-based model that is used
owned by a company, only 20% (one-fifth) of the total to analyze customer behavior and then make predictions
number of customers contribute more to the company's based on the behavior database [7], The RFM model
revenue than other customers [1]. The customers have classifies customer segmentation based on recency
diverse and different priority tendencies, for instance (when was the last transaction made?), frequency (how
the customer grouping or segmentation is considered often did the customer make a transaction?), and
one of the best ways to manage and understand monetary (the value of transactions made) [8], and the
customers [2], [3]. On top of that, the customers have ability of the RFM model has been widely used to
diverse and different priority tendencies; therefore, analyze customer values combined with clustering
customer grouping or segmentation is considered one of techniques [9].
the best ways to manage and understand customers [4],
The application of the RFM model with a score model
[5].
and actual value is used in various industrial sectors as
Various studies have been carried out on customer a combination of clustering techniques and CLV
groups known as customer segmentation, where this (customer lifetime value) analysis. The RFM score

Accepted: 06-09-2023 | Received in revised: 25-12-2023 | Published: 28-12-2023


1430
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)

model has several score calculation techniques, for This study conducted the comparison of the actual value
example, the customer quintile method and the behavior of the RFM model and the RFM score. The value of the
quintile method [7]. At the same time, the actual RFM analysis of the comparison of the value of the RFM
uses the technique of combining the total value model is based on the cluster validation value, using one
(sum/count), average (mean), min, max, and median of the clustering algorithms to obtain the validation
[10], which is then analyzed with RFM based on the value of the cluster results. In contrast the elbow method
average for each attribute R, F, and M, so that each is used in determining the best number of clusters [19].
attribute can be marked with a symbol (↑) when the The dataset used in the formation of the RFM model is
attribute value is above the average value (high) and the outlet to wholesale shopping transaction history
marked with a symbol (↓) when the attribute value is dataset that is queried from the e-commerce platform,
below the average (low) [11]. While the RFM actual with a total of 273,454 transactions with 18 attributes
value model generally carries out the normalization from January 2022 to December 2022. This study
process with the standard scaler/z-score technique in retrieved transaction data from one district, namely
scaling the R, F, and M attribute values, replacing the Wiradesa in the district of Pekalongan, Central Java.
scoring technique carried out by the RFM model score The RFM model with the best cluster validation is used
[12]. as an appropriate input for the clustering model. The
cluster output is then interpreted based on RFM
The clustering technique that is commonly used to
segmentation analysis to get more interesting
obtain customer segmentation or grouping uses a
information and knowledge compared to just using
clustering algorithm. Clustering algorithms such as K-
cluster parameters [20]. With the aim of making the
MEANS, Agglomerative, and DBSCAN are algorithms
interpretation of clustering deeper and more varied as
that group data into several groups based on the
suggestions and recommendations for the business
similarity of the data, so that data with similar attribute
domain.
characteristics are grouped in one cluster
(homogeneous), while data with different attribute
2. Research Methods
characteristics (heterogeneous) are grouped in another
different cluster. The application of clustering with This research goes through the stages shown in figure
various comparisons of cluster algorithms and RFM 1.
models has been widely carried out in various fields, for
example, online retail data [13], data e-commerce [14],
banking transaction data [15], and telecommunication
company transaction data [16].
The previous research stated that the RFM model used
or selected as input in the clustering algorithm process
has an influence on the quality of the cluster results
[7],[8],[9]. The quality of the cluster results is
calculated based on one of the cluster validation
methods, sum square error [17]. In addition, the
selection of the right RFM model also influences the
analysis of cluster results; the output of cluster results is
more compact for fellow clusters (inter-cluster) and
separate for other clusters (intra-cluster) [18].
The object of this research is to develop an e-commerce
platform that can be used to accommodate the needs of
the traditional retail (outlet) ecosystem. The platform
connects retailers and outlets with wholesalers in the
same sub-district area, where wholesalers register all
the products, and then the outlets are used to carry out
shopping transactions for their product by accessing this
platform digitally. To increase salespersons' efficacy in
visiting active merchants and meeting retail priorities
and demands, this e-commerce platform must group the
current retail environment. Currently, salespeople visit
the location based solely on retail demand and without Figure 1. Research Framework
regard to priority, which prevents retailers from
meeting their growth ambitions Collect and Select Data, collecting and selecting data
and information sourced from literature studies, reading

DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1431
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)

and studying research related to research topics, namely Recency (R), Frequency (F) and Monetary
observing research objects, viewing and understanding Value (M). Recency (R), also known as the range of one
outlet shopping transaction data by querying databases, transaction at a specific time in the past, is what it stands
and systematically recording and observing problems for. The shorter the interval, the greater the R value.
that are examined regarding the research object with the Frequency (F) represents frequency, namely the number
aim of obtaining data as input in the RFM model of transactions in a certain period at a certain period, for
process. example, twice in one year or twice in one month. The
higher the frequency, the greater the F value. Monetary
Formation of the RFM Model, historical transaction
Value (M) represents monetary value, namely the value
data serves as a data source for the RFM model, which
of the product in the form of money in a certain period.
is based on earlier research by [2], [10], [17] and others.
The greater the amount of money in that period, the
This research uses historical outlet shopping transaction
higher the value of M.
data for 12 months (January–December) in 2022.
Figure 2 shows the RFM actual value model diagram:
RFM Actual Value, the RFM model describes customer
consumption behavior based on past transaction
databases in a simplified form into three attributes [2]

Figure 2. RFM Actual Value Model Diagram

The results of the process of forming a dataset into the Criteria Recency Frequency
RFM model are stored in a data frame with the name Score and Monetary Score
DF_RFM. Champions 5 4-5
Loyal customers 3-4 4-5
RFM Score Value, it is an RFM model that transforms Potential loyalists 4-5 2-3
Promising 4 1
RFM values into a quantitative score; the steps are [17]: Can’t lose them 1-2 5
Sort the dataset descending by attribute R from the At risk 1-2 3-4
earliest date to the oldest; Divide the dataset into 5 About to sleep 3 1-2
Hibernating 1-2 1-2
quartiles and give a value of 5 for the first 20% of the New customers 5 1
dataset, a value of 4 for the second 20% of the dataset, Need attention 3 3
and so on until a value of 1; Repeat steps a and b for 𝑣− 𝜇𝐴
attributes F and M by sorting F and M in descending 𝑣′ = 𝑆
(1)
order and assigning values; Sort F in each category R
µ is the mean, v is the values, s is the standard
and sort M in each combination of categories R and F.
deviation. For example: What is the z-score of 73600 if
This model will produce RFM segmentation with the µ = 54000 and s = 16000? Then v’: (73600-
criteria and scoring [2], [20], which are then used in the 54000)/16000 = 1.255.
RFM analysis as shown in Table 1.
Each attribute R, F, and M with actual values will be
Standard Scaler Normalization, normalization is carried normalized using the standard scaler technique; the
out so that the range (scale) of recency, frequency, and mean is point 0, and the maximum value is the standard
monetary data values do not differ much. In this study, deviation value.
normalization uses standardization or z-score
The K-Means algorithm is a clustering algorithm that is
normalization, where the normalization process is based
most widely used in data grouping processes in various
on the mean and standard deviation as shown in
industrial and scientific fields such as in marketing,
Formula 1[21].
computer vision, and geo-statistics. The advantages of
Table 1. RFM Scoring

DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1432
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)

K-Means are that , the K-Means simple and easy to si as the silhouette coefficient value, ai as the average
implement, but has a relatively fast processing speed. distance between point i and all points in a (the cluster
On top of that the algorithm very good in processing where point a is), bi as the average distance between
quantitative data with numerical attributes and efficient point i and all points in the cluster other than a.
use of computing resources [19], [22], [23].
RFM Score Analysis, perform an analysis based on the
K-Means Clustering Algorithm, the K-Means algorithm score that has been given, assigning a score to each
is used to cluster or segment outlet shopping transaction retail_id for the recency, frequency, and monetary
data based on the RFM model. In this research, the attributes. The score is worth a scale between 5 and 1.
clustering process was carried out seven times (2–8 The highest value is 5, and the next is 4, 3, 2, 1 [2]. In
clusters). The steps taken in the clustering process were: Table 2, the RFM analysis segments are shown [20]:
Determine the number of clusters, which will make it
Table 2. RFM Segment Analysis
easier to define shopping transaction patterns in outlet
segmentation; Determine the initial centroid value by Criteria Description
taking random data objects as shown in Formula 2. Champions Active customers have recently made
transactions, buy frequently, and spend the
1 𝑁𝑖 most.
𝑉𝑖𝑗 = ∑ 𝑋 (2)
𝑁𝑖 𝑘=0 𝑘𝑗 Loyal Customers who make regular purchases and
customers are responsive to promotions
th th
Vij is The i cluster centroid for the j variable, Ni is the Potential New customers with average frequency
amount of data that is a member of the i cluster, i and k loyalist
Promising Customers with recent purchases but who
is the index of the cluster, j is the index of the variable, didn't spend a lot of money
Xkj is The kth data value in the cluster for the jth variable Needs Customers with above-average scores for
attention recency, frequency, and monetary
Calculate the distance between the centroid point and About to Customers with recency and frequency
each object point as shown in Formula 3 sleep below average may be hibernating.
At risk Customers who shopped some time ago and
𝐷𝑒 = √(𝑥𝑖 − 𝑠𝑖 )2 + (𝑦𝑡 − 𝑡𝑡 )2 (3) need to be reactivated
Can’t lose Customers with characteristics in the past
De as euclidean distance, i as the amount of data, (x,y) them frequently made transactions but currently
as data coordinates and (s,t) as centroid coordinates have not made transactions for a long time.
Hibernating Customers with high recency and low
The closeness of two objects is determined based on the shopping value are likely to become lost
distance between the two objects. Likewise, the customers (inactive customers).
proximity of data to a particular cluster is determined Each attribute R, F, and M will be changed to a value
by the distance between the data and the center of the with a range of 1 to 5, according to the table in Table 2.
cluster. In this stage, it is necessary to calculate the
distance of each data point to each cluster center. To 3. Results and Discussions
calculate the distance from the object to the cluster at
this stage, use the Euclidean distance formula [22]. The In the process of collecting and selecting data,
closest distance between one piece of data and one information is needed regarding understanding the
particular cluster will determine which piece of data running business.
belongs to which cluster. Which are cluster or to the In Figure 3, there is a form of shopping transaction
new centroid and allocate all objects to the closest dataset, and in Table 3, there is an explanation:
cluster to the new centroid. If there are objects that
Table 3. Expenditure Transaction Data Structure
move clusters, repeat step 2 again and if no objects
move clusters, then the clustering process is complete. No Field Name Description
1 Region Region Name
Evaluation of Cluster, evaluation of K-MEANS cluster 2 Subdist_nm District name (sub-distribution)
results using the silhouette index (SI). This method is a 3 Retail_id Outlet ID
validity criterion based on geometric considerations of 4 Retail_name Outlet Name
5 Wholesaler_id Wholesale ID
cohesion, which functions to measure how close the
relations are between objects in a cluster, and the 6 Wholesaler_name Wholesale Name
separation method, which functions to measure how far 7 Order_date Order date
a cluster is separated from the cluster. others[23]. The 8 Order_no Order Number
9 Pcode Product ID
formula used to obtain the silhouette index value is 10 Category Product category
shown in Formula 4 11 Principal Principal product name
12 Qty_sales_order Number of transactions per
𝑏 −𝑎
𝑠𝑖 = max𝑖 {𝑎 𝑖,𝑏 } (4) transaction
𝑖 𝑖 13 Amount_sales_order Value-for-money transactions

DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1433
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)

Figure 3. Sample of historical shopping transaction data in 2022


Table 4. Transaction Data Structure After Attribute Selection
In Formation of the RFM model of actual value and
score value, attributes are selected for the transaction No Field Name Description
dataset formed into data aggregation to obtain the RFM 1 Retail_id Outlet ID
2 Order_date Order date
value, so that the number of outlets becomes 280 3 Order_no Order Number
outlets. The RFM value is formed from recency, 4 Qty_sales_order Number of transactions per
frequency, and currency. Recency is formed by sales order
calculating the difference between the outlet's last 5 Amount_sales_order Transaction monetary value
transaction time for 12 months and the specified time,
Table 5 is an example of data in the form of RFM actual
namely January 1, 2023.
value and RFM score.
Frequency is formed by the number of transactions
The RFM frame data in Table 5 is normalized using a
carried out by the outlet, while monetary is formed by standard scaler/z-score transformation for each outlet.
the nominal amount spent by the outlet to buy products Meanwhile, Table 6 shows the form of the data frame
at wholesalers. Table 4 explains the attributes selected
that has been transformed.
for the transaction dataset:
Table 5. RFM Dataframe Model: Actual Values and Scores

retail_id recency frequency monetary R F M RFM_Segment RFM_Score


C100000641 3 13 47527750 5 2 3 523 10
C100000953 5 7 986650 4 1 1 411 6
C100002548 2 189 464154545 5 5 5 555 15
C100003179 5 26 24206851 4 3 2 432 9
C100003361 87 30 59794160 1 3 3 133 7
... ... ... ... ... ... ... ... ...
C100324412 25 6 17949450 2 1 2 212 5
C100324446 6 13 49567970 3 2 3 323 8
C100326000 50 3 13341500 2 1 2 212 5
C100327075 62 1 1744600 2 1 1 211 4
C100327998 55 1 4110300 2 1 1 211 4

Table 6. Normalized RFM Frame Data

retail_id recency_standarscale Frequency_standarscale monetary_standarscale


C100000641 -0.552550 -0.536811 -0.344879
C100000953 -0.530903 -0.694918 -0.597927
... ... ... ...
C100327075 0.086048 -0.853025 -0.593806
C100327998 0.010283 -0.853025 -0.580943

In Table 6, we can see that the RFM frame data, which Modeling in this research using the K-Means clustering
initially had actual values, was normalized using the algorithm and Jupyter Notebook tools with parameters
standard scaler transformation. and commands as shown in Figure 4.

DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1434
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)

elbow image [9] is at value = 3 (x axis), which means


that the number of clusters selected is 3 clusters.

Figure 4. K-Means Modeling


The commands and parameters in Figure 4 explain, that
the random state = 42 parameter is used as a control
generator so that the initial centroid initiation process is
always fixed (not random) [26]. To get the optimal
number of clusters, the n_cluster parameter is the
desired number of clusters variable. To get the optimal
number of clusters, one can use the elbow and silhouette
score methods [6], [27], the elbow method is formed
from the results of the difference in SSE values for each
number of clusters (2-8). By default, the K-Means
model uses Euclidian distance calculations for each Figure 5. Elbow Method for Determining the Number of Clusters
cluster, as shown in Figure 5. Based on the elbow
The K-Means model is run with n_cluster = 3, resulting
method in Figure 5, it can be seen that the sharpest
in a centroid value in the last iteration as shown in Table
7.
Table 7. Final Centroid Results

Cluster Members Centroid R Centroid F Centroid M


0 200 -0,3929005 -0,181334 -0,248403
1 35 -0,5386339 1,895777 2,114288
2 45 2,1651619 -0,668566 -0,540429

A scatter plot graph in Figure 6, can be seen the RFM value model and the RFM score model into the K-
distribution of data for the monetary recency attribute: Means model to obtain the Silhouette Index value.
Table 8 shows the results of the comparison of the two
models:
Table 8. Comparison Results of Actual RFM and RFM Score Based
on Silhouette

Number of Clusters Actual Value RFM RFM Value


Score
2 0.561928 0.485876
3 0.624646 0.479203
4 0.581366 0.493759
5 0.568411 0.434975
6 0.498001 0.432770
7 0.475831 0.391266
8 0.482351 0.402628

From the Table 8, a silhouette index or score value that


is close to 1 means the cluster quality is relatively good
and ideal. For the RFM score value model, the best
Figure 6. Scatter Recency Against Monatery cluster quality is at the number of clusters (K) = 4 with
a silhouette value of 0.493759, while for the RFM
The green dots, which are data groups with cluster 1, actual value model, the best cluster quality is at the
the purple dots are cluster 0 data points, and the yellow number of clusters (K) = 3 with a silhouette value of
g dots are cluster 2 data points. Each piece of data is 0.624646. From the comparison of the two RFM
spread by grouping similarities in R and M values that models, the actual value model has a higher silhouette
have been normalized. In addition, in the Figure 6, it can value than the score model, so the actual value model is
be seen that each cluster group has a centroid point, considered better than the score value model in this
which is marked with a red X for each cluster. study.
In Cluster Evaluation phase, we compare the Silhouette Apart from that, the comparison Table 8 also shows that
Index RFM Actual Value and the RFM Score Value the number of clusters (K) produced between the elbow
To obtain the best model that will be used to interpret and silhouette index methods has comparable or
the cluster and prototype clustering results, a harmonious values, namely K = 3.RFM Analysis of
comparison test was carried out by running the actual Cluster Results
DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1435
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)

The K-Means modeling is processed using the actual The data frame in Table 9 displays information on
value RFM dataset because the actual value RFM model grouping outlets based on clusters and other
has better silhouette values based on the comparative information, where outlets are also divided based on
evaluation stage of silhouette values. However, in this segment and score criteria. Retail_id C100000641 is a
research, the RFM score model analysis is also used to member of cluster 0 with the Potential Loyalist and
add to the RFM segmentation analysis rules, which can Gold criteria, and C100007252 is a member of cluster 2
provide information and knowledge in interpreting and with the Hibernating and Green criteria. The results will
understanding outlet segmentation. Table 9 shows a add value to cluster interpretation analysis, which is
sample of cluster results after adding segment and score useful for the business domain.
attributes.
Table 9. Cluster Results Data Frame with Segments and Scores

retail_id recency frequency monetary cluster RFM_Segment RFM_Score segment score


C100000641 3 13 47527750 0 523 10 Potential loyalists Gold
C100000953 5 7 986650 0 411 6 Promising Bronze
C100002548 2 189 4,64E+08 1 555 15 Champions Platinum
C100003179 5 26 24206851 0 432 9 Potential loyalists Silver
C100003361 87 30 59794160 0 133 7 At risk Bronze
C100006134 13 35 67717750 0 244 10 At risk Gold
C100006393 44 3 13927600 0 212 5 Hibernating Green
C100006680 5 8 1016650 0 421 7 Potential loyalists Bronze
C100006808 2 54 4,51E+08 1 555 15 Champions Platinum
C100007252 258 6 3329000 2 111 3 Hibernating Green

Apart from that, this research provides information on


outlet mapping based on existing clusters and score
criteria, as can be seen in Figure 7.

Figure 9. Map Outlet Cluster, Segment, and Score


Figure 7. Map Outlet RFM Segment
The map image in Figure 9 is a combination of clusters,
The outlet segment map in Figure 7 provides outlet segments, and scores, which can be seen in the Table10:
information with the composition of segment criteria: Table 10. Cluster Map, Segment, and Score
hibernating (84%), potential loyalists (22%), loyal
outlets (18%), champions (17%), at risk (8%), new Cluster Segment Members Percentage
outlets (2%), need attention (0.7%), can't lose them 0 About to sleep 2 1
0 At risk 16 8
(1.1%), and promising (0.7%). 0 Can't lose them 2 1
0 Champions 27 13,5
0 Hibernating 45 22,5
0 Loyal customers 39 19,5
0 Need attention 1 0,5
0 New Outlet 5 2,5
0 Potential loyalists 61 30,5
0 Promising 2 1
1 At risk 1 2,9
1 Can't lose them 1 2,9
1 Champions 21 60
1 Loyal Outlet 11 31,4
1 Need attention 1 2,9
Figure 8. Map Outlet RFM Score 2 At risk 6 13,3
2 Hibernating 39 86,7
The outlet score map in Figure 8 provides outlet
information with the composition of the score criteria: Based on the explanation Table 9, the cluster results can
platinum (41%), green (23%), silver (15%), bronze be interpreted by looking at the data distribution in the
(13%), and gold (9%). graph in Figure 10:

DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1436
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)

In the image in Figure 10, the red data point is cluster 1,


with high monetary value, low recency, and high
frequency. The blue data point is cluster 0, with medium
recency, medium frequency, and medium currency.
The yellow data point is cluster 2, with high recency,
low shopping frequency, and low monetary value. In
addition to the interpretation based on Figure 10, this
research explains the cluster results using the RFM
segment and score interpretation, which can be seen in
Table 11:

Figure 10. Actual Model RFM Distribution Graph


Table 11. Interpretation of Cluster Results Based on RFM Analysis

Cluster Outlet % RFM Label RFM Intepretation Suggest


Type
0 New 71% 30.5% Outlets with low recency, medium shopping Product promotions, shopping balance
Outlet Potential frequency, and medium shopping value credits, and other features were launched for
Loyalists Based on RFM segmentation, 30% of this these outlets to increase shopping interest and
19.5% Loyal cluster are potential loyalists, and 19.5% are turn them into champions. Special attention to
Outlet outlet loyalists. There are also hibernating this cluster is important because there is a
outlets, which can cause outlets to be lost if potential for hibernating outlets that need to
not handled properly. be reviewed per period.
1 Loyal 13% 31.4% Loyal The outlet that transacts most frequently Management should provide high-value
Outlet Outlet with the highest amount of shopping value information and products and solicit reviews
60% (monetary) and transacts with the lowest from these outlets regarding improved service
Champions frequency and better products.
2 Lost 16% 86.7% Outlets with high recency (long time It is necessary to survey the condition of the
Outlet Hibernating without shopping transactions), low outlet to determine whether it is still actively
shopping frequency, and low shopping operating or not. If they are still active, they
value RFM segmentation provides will be directed to become potential loyal
information that shows that most outlets in outlets; if they are not removed from the
this cluster are hibernating outlets. customer base, this will increase salesman
productivity by looking for new outlets.

4. Conclusion cluster algorithms such as agglomerative, DBSCAN,


GMM, and others; Further research can be carried out
Based on the discussion and research results, the
using the same data sources to segment products and
conclusions that can be drawn are: Transaction data
relationships (associations) of outlet behavior in
based on time (time series data) can be transformed into
carrying out transactions, so that combining these
data in the Recency, Frequent, and Monetary (RFM)
(outlet segmentation and product associations) will
model; The cluster quality of the RFM model's actual
provide deeper and more accurate information.
value is better than the RFM model's score, based on a
comparison of calculations using the Silhouette index
References
or score; The K-means algorithm can carry out the
outlet clustering process, with the number of clusters [1] M. Y. Smaili and H. Hachimi, “Hybridization of improved
(K) equal to 3 outlet clusters based on the elbow and binary bat algorithm for optimizing targeted offers problem in
direct marketing campaigns,” Adv. Sci. Technol. Eng. Syst.,
silhouette score methods; The results of this outlet vol. 5, no. 6, pp. 239–246, 2020, doi: 10.25046/aj050628.
clustering process create a series of data that has a [2] A. J. Christy, A. Umamakeswari, L. Priyatharsini, and A.
cluster label and forms supervised learning data, so that Neyaa, “RFM ranking – An effective approach to customer
it can be used to analyze patterns or trends using other segmentation,” J. King Saud Univ. - Comput. Inf. Sci., vol. 33,
no. 10, pp. 1251–1257, 2021, doi:
data mining models such as classification, estimation, 10.1016/j.jksuci.2018.09.004.
and prediction; Business actors (business domain) can [3] R. Srivastava, A. Parvaneh, and H. Abbasimehr,
plan marketing strategies and how to treat customers “Identification of Customer Clusters using RFM Model: A
appropriately based on the results of outlet cluster Case of Diverse Purchaser Classification,” Int. J. Bus. Anal.
Intell., vol. 4, no. 2, p. 6, 2016.
interpretation. [4] S. Dibb, “Market segmentation: Strategies for success,” Mark.
Intell. Plan., vol. 16, no. 7, pp. 394–406, 1998, doi:
The following are suggestions for further research: The 10.1108/02634509810244390.
research uses historical outlet shopping transaction data [5] F. Safari, N. Safari, and G. A. Montazer, “Customer lifetime
over a wider area, for example, district, city, and even value determination based on RFM model,” Mark. Intell. Plan.,
provincial transaction data.; Further research uses other vol. 34, no. 4, pp. 446–461, 2016, doi: 10.1108/MIP-03-2015-
0060.

DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1437
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)

[6] Y. Huang, M. Zhang, and Y. He, “Research on improved RFM [15] M. Aliyev, E. Ahmadov, H. Gadirli, A. Mammadova, and E.
customer segmentation model based on K-Means algorithm,” Alasgarov, “Segmenting Bank Customers via RFM Model and
Proc. - 2020 5th Int. Conf. Comput. Intell. Appl. ICCIA 2020, Unsupervised Machine Learning,” 2020.
pp. 24–27, 2020, doi: 10.1109/ICCIA49625.2020.00012. [16] B. Arivazhagan and G. Vijaiprabhu, “An Enhanced
[7] J. Wei, S. Lin, and H. Wu, “A review of the application of RFM Hierarchical Model for Customer Segmentation in Customer
model,” African J. Bus. Manag., vol. 4, no. 19, pp. 4199–4206, Relationship Management with Demographic , Recency ,
2010. Frequency and Monetary Values,” Int. J. Mech. Eng., vol. 7,
[8] P. D. Bangsa and I. Hermawan, “Jurnal Teknologi Terpadu,” no. 2, pp. 1878–1886, 2022.
J. Teknol. Terpadu, vol. 7, no. 1, pp. 15–22, 2021.[9] [17] D. Elzanfaly and S. Salama, “Investigation in Customer Value
J. Wu et al., “An Empirical Study on Customer Segmentation Quality under Different Preprocessing Types of
Segmentation by Purchase Behaviors Using a RFM Model and RFM Attributes,” vol. 4, no. 4, pp. 5–10, 2016.
K -Means Algorithm,” Math. Probl. Eng., vol. 2020, no. April [18] A. Gülcü and S. Çalişkan, “Clustering electricity market
2019, 2020, doi: 10.1155/2020/8884227. participants via FRM models,” Intell. Decis. Technol., vol. 14,
[10] D. Chen, S. L. Sain, and K. Guo, “Data mining for the online no. 4, pp. 481–492, 2020, doi: 10.3233/IDT-200092.
retail industry: A case study of RFM model-based customer [19] C. Yuan and H. Yang, “Research on K-Value Selection
segmentation using data mining,” J. Database Mark. Cust. Method of K-Means Clustering Algorithm,” J, vol. 2, no. 2, pp.
Strateg. Manag., vol. 19, no. 3, pp. 197–208, 2012, doi: 226–235, 2019, doi: 10.3390/j2020016.
10.1057/dbm.2012.17. [20] I. Karacan, I. Erdogan, and U. Cebeci, “A Comprehensive
[11] B. Sohrabi and A. Khanlari, “Customer lifetime value Integration of RFM Analysis, Cluster Analysis, and
determination based on RFM model,” Mark. Intell. Plan., vol. Classification for B2B Customer Relationship Management,”
14, 2007, doi: 10.1108/MIP-03-2015-0060. Proc. Int. Conf. Ind. Eng. Oper. Manag., pp. 497–508, 2021.
[12] C. Y. Tsai and C. C. Chiu, “A purchase-based market [21] D. A. Nasution, H. H. Khotimah, and N. Chamidah,
segmentation methodology,” Expert Syst. Appl., vol. 27, no. 2, “Perbandingan Normalisasi Data untuk Klasifikasi Wine
pp. 265–276, 2004, doi: 10.1016/j.eswa.2004.02.005. Menggunakan Algoritma K-NN,” Comput. Eng. Sci. Syst. J.,
[13] S. H. Shihab, S. Afroge, and S. Z. Mishu, “RFM Based Market vol. 4, no. 1, p. 78, 2019, doi: 10.24114/cess.v4i1.11458.
Segmentation Approach Using Advanced K-means and [22] B. Rizki, N. G. Ginasta, M. A. Tamrin, and A. Rahman,
Agglomerative Clustering: A Comparative Study,” 2019 Int. “Customer Loyality Segmentation on Point of Sale System
Conf. Electr. Comput. Commun. Eng., pp. 1–4, 2019. Using Recency-Frequency-Monetary (RFM) and K-Means,” J.
[14] D. Devarapalli, S. Veera, V. Satya, S. Geddam, A. S. Sravya, Online Inform., vol. 5, no. 2, p. 130, 2020, doi:
and A. P. Devi, “Analysis of RFM Customer Segmentation 10.15575/join.v5i2.511.
Using Clustering Algorithms,” Int. J. Mech. Eng. Vol., vol. 7, [23] A. Nowak-Brzezinska and C. Horyn, “ScienceDirect Outliers
no. February, 2022. Outliers in in rules - the the comparision comparision of of
LOF , LOF , COF COF and and K-MEANS K-MEANS,” vol.
00, 2020, doi: 10.1016/j.procs.2020.09.152.

DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1438

You might also like