Implementation of Association Rule Using Apriori A
Implementation of Association Rule Using Apriori A
Abstract — Inventory control is needed to maintain smooth operations and improve product efficiency. The problem of
procurement of goods that are not by needs is a problem that often occurs in various companies. This problem has the
potential to affect business processes that hinder increased sales. Based on this, a strategy is needed in managing inventory
to increase customer satisfaction. This study aims to find combinations of goods to be analyzed, resulting in an Association
Rule that can be used by store owners to manage the inventory of goods. The algorithms used in this study are Apriori and
Frequent Pattern Growth (FP-Growth). Apriori algorithm methods and FP-Growth are two techniques used in data mining to
find combinations of goods resulting in the Association Rule. This research begins with literature study and data acquisition
as the first step in data pre-processing. Next, data analysis is performed to identify patterns and relationships between items
in the dataset. At the evaluation stage, the lift ratio method is used to determine the validity of association rules. This is
an important step to ensure that the rules found have relevance and are being analyzed so that they can be used for better
decision-making. The results of the implementation of a priori and FP-Growth algorithms for 2-item analysis show that they
both produce the same number of association rules, i. e. 16 rules. There is a similarity in the rules with the highest lift ratio,
which is 14. In addition, some other rules with a lift ratio of 5 are also found in both algorithms. Next, the calculation of
the 3-item set lift value, both show the same lift ratio value. This indicates that they are effective in extracting association
rules from the same dataset, for both 2-itemset and 3-itemset.
Keywords – apriori, association rule, data mining, frequent pattern growth, lift ratio
one method in data mining [15], [16]. Data mining is a rithms, especially when the dataset is very large and
complex data analysis process that involves statistical, complex. These two algorithms have the same goal of
mathematical, and artificial intelligence techniques to finding patterns in datasets, but apriori and FP-Growth
explore and discover hidden patterns in large data algorithms have different approaches and techniques
sets [17]–[19]. The goal of data mining is to get valu- for extracting relevant information from data. Natural
able and useful information from data to make business inventory control [37]–[40].
decisions. Data mining can be applied to different
Some previous research has been related to the
types of data such as customer data, inventory data,
utilization of association rule algorithms. For example,
transaction data, marketing data, and other data [20]–
research conducted by Ardiantoro and Sunarmi [41]
[22].
(2019) aims to analyze the game patterns of badminton
The data mining process includes several steps players which is one of the popular sports in Indone-
such as data preprocessing, selection of appropriate sia. In this study, data was collected and processed
data mining methods, method execution, evaluation of by dividing the playing field into several play areas.
results, and interpretation of results. Some commonly In addition, Apridonal et al. [42] (2019) conducted
used data mining techniques include classification, research to develop an application that can identify
regression, clustering, association, and ranking models. sales patterns using the Association Rule method and
Each of these techniques has a different purpose and apriori algorithm. Hu et al. [43] also used frequency
can be applied to different types of data [23], [24]. pattern growth algorithms in their study to analyze the
In business, data mining can help optimize inventory stability of public transport trips using association rule
management, improve product efficiency, improve cus- mining. The next research was conducted by Elisa and
tomer experience, and make more accurate business Azwanti [44] (2019) who used the association rule
decisions [25], [26]. Data mining can also help identify method with the FP-Growth algorithm to analyze the
hidden trends and patterns in data, allowing companies frequency of purchasing 3 kg LPG gas.
to respond more effectively to changing market or Based on previous research that has used FP-Growth
customer needs [27]–[32]. and apriori algorithms for various purposes, such as an-
Two data mining algorithms that are often used alyzing game patterns, travel stability, developing chil-
in inventory analysis are the apriori algorithm and dren’s toy applications, purchasing fertilizers, trans-
FP-Growth. Both of these algorithms can be used portation trip stability, and purchasing 3 kg LPG gas,
in the context of inventory control to find patterns the FP-Growth algorithm is more often used because
of association and high frequency between inventory it is considered superior to apriori [45]. Therefore, this
items [33]–[35]. Apriori algorithm is an algorithm in study will also use the FP-Growth algorithm to find
data mining that is used to find patterns that often association rules to help store owners manage their
appear (frequent itemsets) in a collection of transaction product inventory. Although using the same algorithm,
data. This algorithm works by dividing the dataset the main difference lies in the use of different data,
into smaller subsets and then looking for items that namely clothing sales data from the city of Bengkulu.
often appear in each subset. After that, the found The purpose of this study is to find combinations of
itemsets will be combined to form larger itemsets and goods so that store owners know the right placement of
then recalculated in frequency. This process is done goods and analyze itemsets to obtain association rules
iteratively until no more frequently occurring itemsets using apriori and FP-Growth algorithms.
can be found. Apriori algorithms are useful in data
analysis and marketing, where we can find frequently II. R ESEARCH M ETHOD
emerging consumer buying patterns and create more This research stage begins with a literature review
effective marketing strategies [36]. and then data acquisition is carried out for processing.
Next, analyze by implementing the apriori algorithm
The FP-Growth algorithm uses a top-down approach and FP-Growth in goods control. The research flow
to generate a tree data structure called FP-Tree, which can be seen in Fig. 1.
represents all transactions in a more concise form.
Then, the FP-Growth algorithm will extract frequent A. Literature Study
itemsets from the FP-Tree using a recursive approach, The study of literature is a relatively simple research
which is a more efficient alternative to apriori algo- method, but it is very important in this process of
in association analysis, in which case the researcher • Conditional FP-Tree development phase: FP-
sets the minimum support value at 0.2 % with (1). Growth builds a data structure known as a con-
P
Contains A ditional FP-Tree. This structure describes the
Support(A) = P × 100% (1) relationship between itemsets that appear together
T otal T ransactions
in a transaction.
The formula is used to measure the extent to which • Frequent itemset search phase by utilizing TID
the itemset or element (A) appears in a dataset transac- (Transaction Identity): In this stage, FP-Growth
tion. Data that has met the support set of frequently oc- uses the conditional FP-Tree that has been built
curring items is merged, and the set of non-conforming to efficiently generate frequent itemset. This in-
items will be eliminated, and the set that meets will cludes the calculation of support and confidence
be used in iterations. The second step is to search values.
for 2-itemset. The value of the 2-itemset is calculated E. Evaluation
using (2).
P Association analysis is the validity of a rule de-
Contains A and B termined using the lift ratio method. Lift ratio is
Support (A → B) = P ×100%
T otal T ransaction a common indicator used to determine whether an
(2) association rule is considered valid or not. An associ-
The value containing A and B is the value of the ation rule is considered valid when the lift ratio value
two items of goods in the transaction, dividing the in the rule exceeds 1, and the higher the lift ratio
value containing A and B by the total amount of all value, the stronger the rule. In apriori and FP-Growth
transactions. The third step 3-itemset support value is algorithms, lift ratio is used to evaluate the strength
obtained using (3). of the association rule found to calculate lift ratio can
use (5).
Support (A,B, C) =
P Conf idence(A, B)
Contains A, B, and C Lif t Ratio =
= P × 100% Benchmark Conf idence(A, B)
T otal T ransaction (5)
(3)
For benchmark confidence use (6).
Confidence is one of the important aspects of asso- Nc
ciation analysis, which is used to measure the extent to Benchmark Conf idence = (6)
N
which an association or association rule applies. In this
context, we will understand how strong the relationship III. R ESULT
between two itemsets, namely A and B or A, B and
This section discusses research data, pre-processing,
C. Confidence, measured in percentages, gives an idea
apriori implementation, and FP-Growth implementa-
of how often itemset B appears in transactions that
tion.
also contain itemset A or how often Item C appears in
transactions that also contain items A and B, in this A. Research Data
study the minimum value of confidence has been set Data collection in this study was carried out through
at (0.8 %) To determine the value of confidence, it is clothing sales transactions recorded in the form of
obtained using (4). shopping receipts. Every time a customer makes a pur-
Conf idence = P (A | B) = chase of clothes in the store, the information recorded
N umber of T ransactions A and B in the shopping receipt includes details such as the
= × 100% type of clothes purchased, the number of items, the
N umber of T ransactions A
(4) transaction number. The data used was 140 transactions
and 57 items of goods which were sales samples at the
2) Stages of FP-Growth algorithm implementation GalerryNca store in Bengkulu City. Transaction dataset
The search for associations using the FP-Growth in tabular form as in Table 2.
method is a development of apriori algorithms. In the
FP-Growth algorithm, confidence and support values Table 2 is a dataset of goods transactions, the name
are applied similarly to apriori, but the difference lies of the goods has been changed to initials. To find out
in the more efficient approach used by FP-Growth. the full name of each item, refer to Table 1.
FP-Growth reduces complexity by involving only one B. Pre-processing
iteration through a dataset of transactions, in contrast to
apriori which requires multiple iterations. FP-Growth The data processing process is carried out with the
implements three main stages: aim of finding patterns of similarity of purchased goods
based on daily purchase report data. Sales report data
• Conditional pattern base generation phase: At this every day at first is just ordinary information then
stage, FP-Growth identifies patterns that appear after processing it will be very useful information for
repeatedly in the transaction dataset. future business progress and improvement. Therefore,
16 association rules for 2-itemset, then the calculation LK” and ”If Buying RI then buying LK”. However,
of 3-itemset association rules is carried out using (3) differences also exist between the two. In the apriori
and (4) results are shown in Table 16. algorithm, there is only one rule with the highest
lift ratio, while in the FP-Growth algorithm there are
Table 16. Association Rules FP-Growth 3-itemset
No. Rules Support Confidence two rules with the highest lift ratio. However, in the
(%) (%) apriori algorithm, the lowest lift ratio is 3, while in
1 If buying LK → STXˆCAK 2 12
2 If buying LK → CAKˆSTX 2 12
the FP-Growth algorithm, there are two rules that get
3 If buying LKˆCAK → STX 2 12 the lowest lift with a value of 2. Furthermore, the
4 If buying LKˆSTX → CAK 2 12 calculation of lift values for the 3-itemset hasip is
5 If buying STX → LKˆCAK 2 19 shown in Table 18.
6 If buying STX → CAKˆLK 2 19
7 If buying STXˆCAK → LK 2 19
Both algorithms get the same lift ratio value, this
8 If buying STXˆLK → CAK 2 19
9 If buying CAK → LKˆSTX 2 30 shows that both are equally effective in extracting as-
10 If buying CAK → STXˆLK 2 30 sociation rules from the same dataset. In this case, the
11 If buying CAKˆLK → STX 2 30 analysis does show that the results of both algorithms
12 If buying CAKˆSTX → LK 2 30
13 If buying LK → STXˆRI 4 12
are fully consistent, and there is no difference in the
14 If buying LK → RIˆSTX 4 20 rule-associations generated by apriori and FP-Growth.
15 If buying LKˆSTX → RI 4 20
16 If buying LKˆRI → STX 4 20
17 If buying STX → LKˆRI 4 20 V. C ONCLUSION
18 If buying STX → RIˆLK 4 31
19 If buying STXˆLK → RI 4 31 Based on the research that has been carried out, it
20 If buying STXˆRI → LK 4 31 can be concluded by applying a priori algorithm and
21 If buying RI → LKˆSTX 4 31
FP-Growth has succeeded in finding a combination of
22 If buying RI → STXˆLK 4 42
23 If buying RIˆSTX → LK 4 42 2-itemset and 3-itemsets. The results of the 2-itemset
24 If buying RIˆLK → STX 4 42 analysis found 16 association rules. Meanwhile, 3-
itemsets get 24 association rules. This can be able to
After calculation, 24 association rules were found help store owners place goods according to the criteria
for the 3-itemset of the dataset used. Furthermore, as an of the item.
evaluation of the two algorithms, a lift ratio calculation
will be carried out to determine the extent to which
R EFERENCES
the rules are relevant to the existing data. The lift ratio
is a useful metric to measure the extent to which an [1] E. Wanti, L. P. Maharrani, R. H. A. Prasetya, N. W. Tri-
association is stronger than what is expected. Thus, an pustikasari, and G. N. Ikhtiagung, “Optimation economic order
quantity method for a support system reorder point stock,” Int.
evaluation with a lift ratio will help determine whether J. Electr. Comput. Eng., vol. 10, no. 5, pp. 4992-5000, 2020.
the association rules found are really meaningful and
can be used in further decision-making. [2] K. Nissa and M. T. Siregar, “Analisis pengendalian persediaan
bahan baku kain kemeja poloshirt menggunakan metode eco-
nomic order quantity (EOQ) di PT bina busana internusa,” Int.
IV. D ISCUSSION J. Soc. Sci. Bus., vol. 1, no. 4, pp. 271, 2017.
The calculations that have been carried out using [3] N. N. Merliani, N. I. Khoerida, N. T. Widiawati, L. A. Triana,
and P. Subarkah, “Penerapan algoritma apriori pada transaksi
apriori and FP-Growth algorithms aim to identify the penjualan untuk rekomendasi menu makanan dan minuman,”
rules of significant associations. Furthermore, for the J. Nas. Teknol. dan Sist. Inf., vol. 8, no. 1, pp. 9–16, 2022.
evaluation of the association rules that have been
[4] Fitriah, I. Riadi, and Herman, “Analisis data mining sistem
found, the calculation of the lift ratio value will be car- inventory menggunakan algoritma apriori,” Decod. J. Pendidik.
ried out using (5). The use of lift ratio as an evaluation Teknol. Inf., vol. 3, no. 1, pp. 118–129, 2023.
metric in association analysis is based on fundamental [5] S. E. S. Careza Rizky and Y. Sudarso, “Analisis perbandingan
scientific considerations, namely, to measure the accu- metode EOQ dan metode POQ dengan metode min-max dalam
racy and significance of relationships between items pengendalian persediaan bahan baku pada PT sidomuncul
pupuk nusantara,” Admisi dan Bisnis, vol. 17 No 1, no. ISSN
in association rules. The results of calculating the lift 1411 – 4321, pp. 11–22, 2016.
ratio for the 2-itemset are shown in Table 17.
[6] S. A. Rachmawati, L. Syafirullah, and M. N. Faiz, “Peran-
Results from apriori and FP-Growth algorithms for cangan sistem pengendalian persediaan barang menggunakan
2-itemsets show differences and similarities in the metode EOQ dan ROP berbasis web,” Semin. Nas. Terap. Ris.
Inov. Ke-6, vol. 6, no. 1, pp. 778–786, 2020.
association rules found. Both algorithms produce the
same number of association rules, which is 16 rules. [7] E. Fatma and D. S. Pulungan, “Analisis pengendalian perse-
diaan menggunakan metode probabilistik dengan kebijakan
There is a similarity in the rule with the highest lift backorder dan lost sales,” J. Tek. Ind., vol. 19, no. 1, pp. 38,
ratio, where both find the rule ”If you buy BF, then 2018.
buy BKN” with a lift ratio of 14. In addition, some
[8] S. Bagui, K. Devulapalli, and J. Coffey, “A heuristic approach
other rules with a lift ratio of 5 are also found in for load balancing the FP-growth algorithm on MapReduce,”
both algorithms, such as ”If Buying STX then buying Array, vol. 7, pp. 100035, 2020.
[9] L. Linwei, W. Yiping, H. Yepiao, L. Bo, M. Fasheng, and Systems, Services, and Applications (TSSA), 03-04 October
D. Ziqiang, “Optimized apriori algorithm for deformation 2019, Bali, Indonesia, pp. 97–101.
response analysis of landslide hazards,” Comput. Geosci., vol.
170, 2023. [14] M. Fauzy, K. R. Saleh W, and I. Asror, “Penerapan metode
association rule menggunakan algoritma apriori pada simulasi
[10] L. Zheng, “Research on e-commerce potential client mining prediksi hujan wilayah kota Bandung,” J. Ilm. Teknol. Inf.
applied to apriori association rule algorithm,” in Proc. - 2020 Terap., vol. II, no. 2, pp. 221–227, 2016.
Int. Conf. Intell. Transp. Big Data Smart City, ICITBS 2020,
pp. 667–670, 2020. [15] G. G. Prabawa, I. G. M. Darmawiguna, and I. M. A. Wirawan,
“Pengembangan sistem pendukung keputusan pengendalian
[11] W. Thurachon and W. Kreesuradej, “Incremental association persediaan barang menggunakan metode economic order quan-
rule mining with a fast incremental updating frequent pattern tity (EOQ) dan min-max berbasis web (Studi kasus: Apotek
growth algorithm,” IEEE Access, vol. 9, pp. 55726–55741, sahabat qita),” J. Nas. Pendidik. Tek. Inform., vol. 7, no. 2, pp.
2021. 107, 2019.
[12] M. Yasir, M. A. Habib, M. Ashraf, S. Sarwar, M. U. Chaudhry, [16] S. Fauziah and Ratnawati, “Penerapan metode FIFO pada
M. Shahwani, M. Ahmad, CH. M. N. Faisal, “D-GENE: sistem informasi persediaan barang,” J. Tek. Komput., vol. 4,
Deferring the GENEration of power sets for discovering fre- no. 1, pp. 98–108, 2018.
quent itemsets in sparse big data,” IEEE Access, vol. 8, pp.
27375–27392, 2020. [17] E. Callens, “Financial instruments entail liabilities: Ether, bit-
coin, and litecoin do not,” Comput. Law Secur. Rev., vol. 40,
[13] J. R. D. Arcos and A. A. Hernandez, “Efficient apriori algo- Apr 2021.
rithm using enhanced transaction reduction approach,” in 2019
IEEE 13th International Conference on Telecommunication [18] P. Giudici and I. Abu-Hashish, “What determines bitcoin
exchange prices? A network VAR approach,” Financ. Res. [38] W. N. Setyo and S. Wardhana, “Implementasi data mining pada
Lett., vol. 28, pp. 309–318, Mar 2019. penjualan produk di CV cahaya etya menggunakan algoritma
fp-growth,” Petir, vol. 12, no. 1, pp. 54–63, 2019,
[19] D. G. Anghel, “A reality check on trading rule performance
in the cryptocurrency market: Machine learning vs. technical [39] Z. Abidin, A. K. Amartya, and A. Nurdin, “Penerapan algo-
analysis,” Financ. Res. Lett., vol. 39, Mar 2021. ritma apriori pada penjualan suku cadang kendaraan roda dua
(studi kasus: Toko prima motor sidomulyo),” J. Teknoinfo, vol.
[20] B. Let, K. Sobanski, W. Swider, and K. Wlosik, “What 16, no. 2, pp. 225, 2022.
drives the popularity of stablecoins? Measuring the frequency
dynamics of connectedness between volatile and stable cryp- [40] E. Irfiani, “Application of apriori algorithms to determine
tocurrencies,” Technol. Forecast. Soc. Change, vol. 189, 2023. associations in outdoor sports equipment stores,” SinkrOn, vol.
3, no. 2, pp. 218, 2019.
[21] S. Zhang and G. Mani, “Popular cryptoassets (bitcoin,
ethereum, and dogecoin), gold, and their relationships: Volatil- [41] L. Ardiantoro and N. Sunarmi, “Badminton player scouting
ity and correlation modeling,” Data Sci. Manag., vol. 4, pp. analysis using frequent pattern growth (FP-growth) algorithm,”
30–39, Des 2021. J. Phys. Conf. Ser., vol. 1456, no. 1, 2020.
[22] L. J. Liebi, “Is there a value premium in cryptoasset markets?,” [42] W. Apridonal M, Y. Choiriah, and A. Akmal, “Penerapan data
Econ. Model., vol. 109, Apr 2022. mining menggunakan metode association rule dengan algo-
ritma apriori untuk analisa pola penjualan barang,” JURTEKSI
[23] A. S. Hoong Lee, L. S. Yap, H. N. Chua, Y. C. Low, and M. (Jurnal Teknol. dan Sist. Informasi), vol. 5, no. 2, pp. 193–198,
A. Ismail, “A data mining approach to analyse crash injury 2019.
severity level,” J. Eng. Sci. Technol., vol. 16, pp. 1–14, 2021.
[43] S. Hu, Q. Liang, H. Qian, J. Weng, W. Zhou, and P. Lin,
[24] S. Wang, J. Cao, and P. S. Yu, “Deep learning for spatio- “Frequent-pattern growth algorithm based association rule min-
temporal data mining: A survey,” IEEE Transactions on Knowl- ing method of public transport travel stability,” Int. J. Sustain.
edge and Data Engineering, vol. 34, no. 8, pp. 1–21, 2019. Transp., vol. 15, no. 11, pp. 879–892, 2021.
[25] A. De Vries, “Cryptocurrencies on the road to sustainability: [44] E. Elisa and N. Azwanti, “Algoritma fp-growth untuk men-
Ethereum paving the way for Bitcoin,” Patterns, vol. 4, no. 1. ganalisa frekuensi pembelian gas elpiji 3 kg,” INTENSIF J.
Cell Press, 13 Januari 2023. Ilm. Penelit. dan Penerapan Teknol. Sist. Inf., vol. 3, no. 1, pp.
69, 2019.
[26] I. Chalkiadakis, A. Zaremba, G. W. Peters, and M. J. Chantler,
“On-chain analytics for sentiment-driven statistical causality in [45] R. M. Anggraeni, “Perbandingan algoritma apriori dan algo-
cryptocurrencies,” Blockchain Res. Appl., vol. 3, no. 2, 2022. ritma fp-growth untuk perekomendasi pada transaksi peminja-
man buku di perpustakaan Universitas Dian Nuswantoro,” Tek.
[27] S. Syahriani, “Penerapan data mining untuk menentukan pola Inform., pp. 1–6, 2014.
penjualan sepatu menggunakan metode algoritma apriori,” Bina
Insa. Ict J., vol. 9, no. 1, pp. 43, 2022.
[28] F. Rahmawati and N. Merlina, “Metode data mining terhadap
data penjualan sparepart mesin fotocopy menggunakan algo-
ritma apriori,” PIKSEL Penelit. Ilmu Komput. Sist. Embed.
Log., vol. 6, no. 1, pp. 9–20, 2018.
[29] Y. A. Ünvan, “Market basket analysis with association rules,”
Commun. Stat. - Theory Methods, vol. 50, no. 7, pp.
1615–1628, 2021.
[30] P. H. Agapito, G. Milano, M. Guzzi and M. Cannataro, “Mining
association rules from disease ontology,” in Proc. - 2019 IEEE
Int. Conf. Bioinforma. Biomed. BIBM 2019, pp. 2239–2243,
2019.
[31] G. Zhang, C. Liu, and M. Tao, “Data mining technology based
on association rules algorithm,” Int. J. Mechatronics Appl.
Mech., vol. 2019, no. 5, pp. 106–112, 2019.
[32] Z. Chen and B. Ye, “Association rule analysis of petrochemical
fire accidents based on the apriori algorithm,” in 2019 9th Int.
Conf. Fire Sci. Fire Prot. Eng. ICFSFPE 2019, pp. 0–4, 2019.
[33] E. Elisa, “Market basket analyis pada mini market ayu dengan
algoritma apriori,” Jurnal Resti, vol. 1, no. 2, 2018.
[34] A. W. O. Gama, I. K. G. D. Putra, and I. P. A. Bayupati,
“Implementasi algoritma apriori untuk menemukan frequent
itemset dalam keranjang belanja,” Maj. Ilm. Teknol. Elektro,
vol. 15, no. 2, pp. 21–26, 2016.
[35] A. Salam, J. Zeniarja, W. Wicaksono, and L. Kharisma, “Pen-
carian pola asosiasi untuk penataan barang dengan menggu-
nakan perbandingan algoritma apriori dan Fp-Growth (study
kasus distro epo store Pemalang),” Dinamik, vol. 23, no. 2, pp.
57–65, 2019.
[36] M. D. Febrianto and A. Supriyanto, “Implementasi algoritma
apriori untuk menentukan pola pembelian produk,” Jurikom,
vol. 9, no. 6, pp. 2010–2020, 2022.
[37] F. AR. Mado, “Analisis persediaan bahan baku produk usaha
sale pisang industri rumah tangga ‘Sofie’ di kota Palu,” e-J.
Agrotekbis, vol. 4, no. 2, pp. 204–209, 2016.