Final Compare
Final Compare
Decree of the Director General of Higher Education, Research and Technology, No. 158/E/KPT/2021
Validity period from Volume 5 Number 2 of 2021 to Volume 10 Number 1 of 2026
JURNAL RESTI
(Rekayasa Sistem dan Teknologi Informasi)
Vol. 7 No. 6 (2023) 1430 - 1438 ISSN Media Electronic: 2580-0760
Comparison of the RFM Model's Actual Value and Score Value for
Clustering
Samidi1, Ronal Yulyanto Suladi2, Dewi Kusumaningsih3
1,2,3Master of Computer Science, Faculty of Information Technology, Universitas Budi Luhur, Jakarta, Indonesia
[email protected], [email protected], [email protected]
Abstract
Clustering algorithms and Recency-Frequency-Monetery (RFM) models are widely implemented in various sectors of e-
commerce, banking, telecommunications, and other industries to obtain customer segmentation. The RFM model will assess a
line of data which includes the recency and frequency of data appearance as well as the monetary value of a transaction made
by a customer. Choosing the right RFM model also influences the analysis of cluster results, the output of cluster results is
more compact for the same clusters (inter-cluster) and separate for other clusters (intra-cluster). Through an experimental
approach, this research aims to find the best dataset transformation model between actual RFM values and RFM scores. The
method used is to compare the actual RFM value model and the RFM score and use the silhouette score value as an indicator
to get the best clustering results using the K-Means algorithm. The subject of this research is a stall-based e-commerce
application, where data was taken in the Wiradesa area, Central Java. The resulting dataset consisted of 273,454 rows with
18 attributes from January 2022 to December 2022 through collecting historical data from shopping outlets to wholesalers.
Analysis of the dataset was carried out by transforming the dataset using the RFM method into actual values and score values,
then the dataset was used to obtain the best cluster data. The results of this research show that transaction data based on time
(time series) can be transformed into data in the RFM model where the RFM model's actual value is better than the RFM score
model with a silhouette score = 0.624646 and the number of clusters (K) =3. The results of the clustering process also form a
series of data with a cluster label, thus forming supervised learning data.
Keywords: RFM model; RFM actual value; RFM core value; clustering
model has several score calculation techniques, for This study conducted the comparison of the actual value
example, the customer quintile method and the behavior of the RFM model and the RFM score. The value of the
quintile method [7]. At the same time, the actual RFM analysis of the comparison of the value of the RFM
uses the technique of combining the total value model is based on the cluster validation value, using one
(sum/count), average (mean), min, max, and median of the clustering algorithms to obtain the validation
[10], which is then analyzed with RFM based on the value of the cluster results. In contrast the elbow method
average for each attribute R, F, and M, so that each is used in determining the best number of clusters [19].
attribute can be marked with a symbol (↑) when the The dataset used in the formation of the RFM model is
attribute value is above the average value (high) and the outlet to wholesale shopping transaction history
marked with a symbol (↓) when the attribute value is dataset that is queried from the e-commerce platform,
below the average (low) [11]. While the RFM actual with a total of 273,454 transactions with 18 attributes
value model generally carries out the normalization from January 2022 to December 2022. This study
process with the standard scaler/z-score technique in retrieved transaction data from one district, namely
scaling the R, F, and M attribute values, replacing the Wiradesa in the district of Pekalongan, Central Java.
scoring technique carried out by the RFM model score The RFM model with the best cluster validation is used
[12]. as an appropriate input for the clustering model. The
cluster output is then interpreted based on RFM
The clustering technique that is commonly used to
segmentation analysis to get more interesting
obtain customer segmentation or grouping uses a
information and knowledge compared to just using
clustering algorithm. Clustering algorithms such as K-
cluster parameters [20]. With the aim of making the
MEANS, Agglomerative, and DBSCAN are algorithms
interpretation of clustering deeper and more varied as
that group data into several groups based on the
suggestions and recommendations for the business
similarity of the data, so that data with similar attribute
domain.
characteristics are grouped in one cluster
(homogeneous), while data with different attribute
2. Research Methods
characteristics (heterogeneous) are grouped in another
different cluster. The application of clustering with This research goes through the stages shown in figure
various comparisons of cluster algorithms and RFM 1.
models has been widely carried out in various fields, for
example, online retail data [13], data e-commerce [14],
banking transaction data [15], and telecommunication
company transaction data [16].
The previous research stated that the RFM model used
or selected as input in the clustering algorithm process
has an influence on the quality of the cluster results
[7],[8],[9]. The quality of the cluster results is
calculated based on one of the cluster validation
methods, sum square error [17]. In addition, the
selection of the right RFM model also influences the
analysis of cluster results; the output of cluster results is
more compact for fellow clusters (inter-cluster) and
separate for other clusters (intra-cluster) [18].
The object of this research is to develop an e-commerce
platform that can be used to accommodate the needs of
the traditional retail (outlet) ecosystem. The platform
connects retailers and outlets with wholesalers in the
same sub-district area, where wholesalers register all
the products, and then the outlets are used to carry out
shopping transactions for their product by accessing this
platform digitally. To increase salespersons' efficacy in
visiting active merchants and meeting retail priorities
and demands, this e-commerce platform must group the
current retail environment. Currently, salespeople visit
the location based solely on retail demand and without Figure 1. Research Framework
regard to priority, which prevents retailers from
meeting their growth ambitions Collect and Select Data, collecting and selecting data
and information sourced from literature studies, reading
DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1431
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)
and studying research related to research topics, namely Recency (R), Frequency (F) and Monetary
observing research objects, viewing and understanding Value (M). Recency (R), also known as the range of one
outlet shopping transaction data by querying databases, transaction at a specific time in the past, is what it stands
and systematically recording and observing problems for. The shorter the interval, the greater the R value.
that are examined regarding the research object with the Frequency (F) represents frequency, namely the number
aim of obtaining data as input in the RFM model of transactions in a certain period at a certain period, for
process. example, twice in one year or twice in one month. The
higher the frequency, the greater the F value. Monetary
Formation of the RFM Model, historical transaction
Value (M) represents monetary value, namely the value
data serves as a data source for the RFM model, which
of the product in the form of money in a certain period.
is based on earlier research by [2], [10], [17] and others.
The greater the amount of money in that period, the
This research uses historical outlet shopping transaction
higher the value of M.
data for 12 months (January–December) in 2022.
Figure 2 shows the RFM actual value model diagram:
RFM Actual Value, the RFM model describes customer
consumption behavior based on past transaction
databases in a simplified form into three attributes [2]
The results of the process of forming a dataset into the Criteria Recency Frequency
RFM model are stored in a data frame with the name Score and Monetary Score
DF_RFM. Champions 5 4-5
Loyal customers 3-4 4-5
RFM Score Value, it is an RFM model that transforms Potential loyalists 4-5 2-3
Promising 4 1
RFM values into a quantitative score; the steps are [17]: Can’t lose them 1-2 5
Sort the dataset descending by attribute R from the At risk 1-2 3-4
earliest date to the oldest; Divide the dataset into 5 About to sleep 3 1-2
Hibernating 1-2 1-2
quartiles and give a value of 5 for the first 20% of the New customers 5 1
dataset, a value of 4 for the second 20% of the dataset, Need attention 3 3
and so on until a value of 1; Repeat steps a and b for 𝑣− 𝜇𝐴
attributes F and M by sorting F and M in descending 𝑣′ = 𝑆
(1)
order and assigning values; Sort F in each category R
µ is the mean, v is the values, s is the standard
and sort M in each combination of categories R and F.
deviation. For example: What is the z-score of 73600 if
This model will produce RFM segmentation with the µ = 54000 and s = 16000? Then v’: (73600-
criteria and scoring [2], [20], which are then used in the 54000)/16000 = 1.255.
RFM analysis as shown in Table 1.
Each attribute R, F, and M with actual values will be
Standard Scaler Normalization, normalization is carried normalized using the standard scaler technique; the
out so that the range (scale) of recency, frequency, and mean is point 0, and the maximum value is the standard
monetary data values do not differ much. In this study, deviation value.
normalization uses standardization or z-score
The K-Means algorithm is a clustering algorithm that is
normalization, where the normalization process is based
most widely used in data grouping processes in various
on the mean and standard deviation as shown in
industrial and scientific fields such as in marketing,
Formula 1[21].
computer vision, and geo-statistics. The advantages of
Table 1. RFM Scoring
DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1432
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)
K-Means are that , the K-Means simple and easy to si as the silhouette coefficient value, ai as the average
implement, but has a relatively fast processing speed. distance between point i and all points in a (the cluster
On top of that the algorithm very good in processing where point a is), bi as the average distance between
quantitative data with numerical attributes and efficient point i and all points in the cluster other than a.
use of computing resources [19], [22], [23].
RFM Score Analysis, perform an analysis based on the
K-Means Clustering Algorithm, the K-Means algorithm score that has been given, assigning a score to each
is used to cluster or segment outlet shopping transaction retail_id for the recency, frequency, and monetary
data based on the RFM model. In this research, the attributes. The score is worth a scale between 5 and 1.
clustering process was carried out seven times (2–8 The highest value is 5, and the next is 4, 3, 2, 1 [2]. In
clusters). The steps taken in the clustering process were: Table 2, the RFM analysis segments are shown [20]:
Determine the number of clusters, which will make it
Table 2. RFM Segment Analysis
easier to define shopping transaction patterns in outlet
segmentation; Determine the initial centroid value by Criteria Description
taking random data objects as shown in Formula 2. Champions Active customers have recently made
transactions, buy frequently, and spend the
1 𝑁𝑖 most.
𝑉𝑖𝑗 = ∑ 𝑋 (2)
𝑁𝑖 𝑘=0 𝑘𝑗 Loyal Customers who make regular purchases and
customers are responsive to promotions
th th
Vij is The i cluster centroid for the j variable, Ni is the Potential New customers with average frequency
amount of data that is a member of the i cluster, i and k loyalist
Promising Customers with recent purchases but who
is the index of the cluster, j is the index of the variable, didn't spend a lot of money
Xkj is The kth data value in the cluster for the jth variable Needs Customers with above-average scores for
attention recency, frequency, and monetary
Calculate the distance between the centroid point and About to Customers with recency and frequency
each object point as shown in Formula 3 sleep below average may be hibernating.
At risk Customers who shopped some time ago and
𝐷𝑒 = √(𝑥𝑖 − 𝑠𝑖 )2 + (𝑦𝑡 − 𝑡𝑡 )2 (3) need to be reactivated
Can’t lose Customers with characteristics in the past
De as euclidean distance, i as the amount of data, (x,y) them frequently made transactions but currently
as data coordinates and (s,t) as centroid coordinates have not made transactions for a long time.
Hibernating Customers with high recency and low
The closeness of two objects is determined based on the shopping value are likely to become lost
distance between the two objects. Likewise, the customers (inactive customers).
proximity of data to a particular cluster is determined Each attribute R, F, and M will be changed to a value
by the distance between the data and the center of the with a range of 1 to 5, according to the table in Table 2.
cluster. In this stage, it is necessary to calculate the
distance of each data point to each cluster center. To 3. Results and Discussions
calculate the distance from the object to the cluster at
this stage, use the Euclidean distance formula [22]. The In the process of collecting and selecting data,
closest distance between one piece of data and one information is needed regarding understanding the
particular cluster will determine which piece of data running business.
belongs to which cluster. Which are cluster or to the In Figure 3, there is a form of shopping transaction
new centroid and allocate all objects to the closest dataset, and in Table 3, there is an explanation:
cluster to the new centroid. If there are objects that
Table 3. Expenditure Transaction Data Structure
move clusters, repeat step 2 again and if no objects
move clusters, then the clustering process is complete. No Field Name Description
1 Region Region Name
Evaluation of Cluster, evaluation of K-MEANS cluster 2 Subdist_nm District name (sub-distribution)
results using the silhouette index (SI). This method is a 3 Retail_id Outlet ID
validity criterion based on geometric considerations of 4 Retail_name Outlet Name
5 Wholesaler_id Wholesale ID
cohesion, which functions to measure how close the
relations are between objects in a cluster, and the 6 Wholesaler_name Wholesale Name
separation method, which functions to measure how far 7 Order_date Order date
a cluster is separated from the cluster. others[23]. The 8 Order_no Order Number
9 Pcode Product ID
formula used to obtain the silhouette index value is 10 Category Product category
shown in Formula 4 11 Principal Principal product name
12 Qty_sales_order Number of transactions per
𝑏 −𝑎
𝑠𝑖 = max𝑖 {𝑎 𝑖,𝑏 } (4) transaction
𝑖 𝑖 13 Amount_sales_order Value-for-money transactions
DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1433
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)
In Table 6, we can see that the RFM frame data, which Modeling in this research using the K-Means clustering
initially had actual values, was normalized using the algorithm and Jupyter Notebook tools with parameters
standard scaler transformation. and commands as shown in Figure 4.
DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1434
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)
A scatter plot graph in Figure 6, can be seen the RFM value model and the RFM score model into the K-
distribution of data for the monetary recency attribute: Means model to obtain the Silhouette Index value.
Table 8 shows the results of the comparison of the two
models:
Table 8. Comparison Results of Actual RFM and RFM Score Based
on Silhouette
The K-Means modeling is processed using the actual The data frame in Table 9 displays information on
value RFM dataset because the actual value RFM model grouping outlets based on clusters and other
has better silhouette values based on the comparative information, where outlets are also divided based on
evaluation stage of silhouette values. However, in this segment and score criteria. Retail_id C100000641 is a
research, the RFM score model analysis is also used to member of cluster 0 with the Potential Loyalist and
add to the RFM segmentation analysis rules, which can Gold criteria, and C100007252 is a member of cluster 2
provide information and knowledge in interpreting and with the Hibernating and Green criteria. The results will
understanding outlet segmentation. Table 9 shows a add value to cluster interpretation analysis, which is
sample of cluster results after adding segment and score useful for the business domain.
attributes.
Table 9. Cluster Results Data Frame with Segments and Scores
DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1436
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)
DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1437
Samidi, Ronal Yulyanto Suladi, Dewi Kusumaningsih
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 6 (2023)
[6] Y. Huang, M. Zhang, and Y. He, “Research on improved RFM [15] M. Aliyev, E. Ahmadov, H. Gadirli, A. Mammadova, and E.
customer segmentation model based on K-Means algorithm,” Alasgarov, “Segmenting Bank Customers via RFM Model and
Proc. - 2020 5th Int. Conf. Comput. Intell. Appl. ICCIA 2020, Unsupervised Machine Learning,” 2020.
pp. 24–27, 2020, doi: 10.1109/ICCIA49625.2020.00012. [16] B. Arivazhagan and G. Vijaiprabhu, “An Enhanced
[7] J. Wei, S. Lin, and H. Wu, “A review of the application of RFM Hierarchical Model for Customer Segmentation in Customer
model,” African J. Bus. Manag., vol. 4, no. 19, pp. 4199–4206, Relationship Management with Demographic , Recency ,
2010. Frequency and Monetary Values,” Int. J. Mech. Eng., vol. 7,
[8] P. D. Bangsa and I. Hermawan, “Jurnal Teknologi Terpadu,” no. 2, pp. 1878–1886, 2022.
J. Teknol. Terpadu, vol. 7, no. 1, pp. 15–22, 2021.[9] [17] D. Elzanfaly and S. Salama, “Investigation in Customer Value
J. Wu et al., “An Empirical Study on Customer Segmentation Quality under Different Preprocessing Types of
Segmentation by Purchase Behaviors Using a RFM Model and RFM Attributes,” vol. 4, no. 4, pp. 5–10, 2016.
K -Means Algorithm,” Math. Probl. Eng., vol. 2020, no. April [18] A. Gülcü and S. Çalişkan, “Clustering electricity market
2019, 2020, doi: 10.1155/2020/8884227. participants via FRM models,” Intell. Decis. Technol., vol. 14,
[10] D. Chen, S. L. Sain, and K. Guo, “Data mining for the online no. 4, pp. 481–492, 2020, doi: 10.3233/IDT-200092.
retail industry: A case study of RFM model-based customer [19] C. Yuan and H. Yang, “Research on K-Value Selection
segmentation using data mining,” J. Database Mark. Cust. Method of K-Means Clustering Algorithm,” J, vol. 2, no. 2, pp.
Strateg. Manag., vol. 19, no. 3, pp. 197–208, 2012, doi: 226–235, 2019, doi: 10.3390/j2020016.
10.1057/dbm.2012.17. [20] I. Karacan, I. Erdogan, and U. Cebeci, “A Comprehensive
[11] B. Sohrabi and A. Khanlari, “Customer lifetime value Integration of RFM Analysis, Cluster Analysis, and
determination based on RFM model,” Mark. Intell. Plan., vol. Classification for B2B Customer Relationship Management,”
14, 2007, doi: 10.1108/MIP-03-2015-0060. Proc. Int. Conf. Ind. Eng. Oper. Manag., pp. 497–508, 2021.
[12] C. Y. Tsai and C. C. Chiu, “A purchase-based market [21] D. A. Nasution, H. H. Khotimah, and N. Chamidah,
segmentation methodology,” Expert Syst. Appl., vol. 27, no. 2, “Perbandingan Normalisasi Data untuk Klasifikasi Wine
pp. 265–276, 2004, doi: 10.1016/j.eswa.2004.02.005. Menggunakan Algoritma K-NN,” Comput. Eng. Sci. Syst. J.,
[13] S. H. Shihab, S. Afroge, and S. Z. Mishu, “RFM Based Market vol. 4, no. 1, p. 78, 2019, doi: 10.24114/cess.v4i1.11458.
Segmentation Approach Using Advanced K-means and [22] B. Rizki, N. G. Ginasta, M. A. Tamrin, and A. Rahman,
Agglomerative Clustering: A Comparative Study,” 2019 Int. “Customer Loyality Segmentation on Point of Sale System
Conf. Electr. Comput. Commun. Eng., pp. 1–4, 2019. Using Recency-Frequency-Monetary (RFM) and K-Means,” J.
[14] D. Devarapalli, S. Veera, V. Satya, S. Geddam, A. S. Sravya, Online Inform., vol. 5, no. 2, p. 130, 2020, doi:
and A. P. Devi, “Analysis of RFM Customer Segmentation 10.15575/join.v5i2.511.
Using Clustering Algorithms,” Int. J. Mech. Eng. Vol., vol. 7, [23] A. Nowak-Brzezinska and C. Horyn, “ScienceDirect Outliers
no. February, 2022. Outliers in in rules - the the comparision comparision of of
LOF , LOF , COF COF and and K-MEANS K-MEANS,” vol.
00, 2020, doi: 10.1016/j.procs.2020.09.152.
DOI: https://doi.org/10.29207/resti.v7i6.5416
Creative Commons Attribution 4.0 International License (CC BY 4.0)
1438