Analysis of Classification Algorithm in Pension Types2019Journal of Physics Conference Series
Analysis of Classification Algorithm in Pension Types2019Journal of Physics Conference Series
Abstract. The concept of data mining in its implementation is used in handling and analysing
data in large capacity. Steps to extract information and to gain knowledge in supporting decision
making. Data mining, in its implementation can be used in various sectors of data analysis,
namely about the condition of the type of pension. The purpose of this study is to provide
information, where a civil servant within the retirement limit is taken normally or earlier. The
technique of using the classification method, with the decision tree C4.5 (J48), Naïve Bayes and
k Nearest Neighbour algorithm. Presentation of assets of 1,316 pension data in 2012 and 2013
from Bukopin Bank customers, which are divided into training data by 65% and data testing by
35%. From the testing of three algorithms, the highest classification was produced, namely Naïve
Bayes with 91%. The results of the three classifications indicate the quality of attribute
determination can affect the results. And the predictive level of the three algorithms shows
different results. For classification techniques with better results, improvements are needed to
determine attributes and develop existing datasets.
1. Introduction
Retirement is a condition of someone who has stopped working because his term is finished. Retirement
is caused by several factors ranging from age and early retirement. A retired person receives monthly
allowances in accordance with the provisions of the company or agency. As for pension funds to manage
and run the programs set by the pension fund regulations. The retirement age for civil servants is divided
into three: 58 years old for administrative officials, 60 years old for high ranking officials, and 65 years
for functional officials [1].
The types of pensions held for civil servants are normal, accelerated, postponed and disabled
pensions. For a normal pension, the tenure is over 30 years, while the pension is accelerated under 30
years. At retirement postponement of work by reaching normal retirement age and disability pension is
a termination of work that is unable to continue work. The concept of this study uses a type of normal
and accelerated pension. Because the retirement data at the Bank Bukopin Surabaya Branch is a type of
pension in the normal and accelerated category.
Pension is a condition of someone who is not allowed to work in a company or agency because the
term of the assignment is complete [2]. Retirement is caused by several factors ranging from age and
early retirement. Someone who has retired will get a monthly allowance in accordance with the
provisions of the company or agency. As for pension funds to manage and run programs stipulated by
pension fund regulations. Pensioners can receive a monthly pension by continuing to become productive
workers through pension credit [3]. Thus, pension credit is used to finance the needs of the applicant
regularly every month. The important role of this credit card is to make transactions easier for the
company and a big concern in the company [3].
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
4th Annual Applied Science and Engineering Conference IOP Publishing
Journal of Physics: Conference Series 1402 (2019) 066096 doi:10.1088/1742-6596/1402/6/066096
In the form of data mining in the collection and storage of data taken from the past based on the data
warehouse. Where, the data can be used to find information on the method to be selected. The method
learns to extract knowledge or find patterns from large data. Data mining will be processed through
several stages of data, namely selection, pre-processing, transformation, data mining, and evaluation.
Technical pre-processing in preparing data in the form of removing duplicate data, removing data
inconsistencies, and correcting data errors. In the set of data can be divided into attributes, types of
datasets, and public datasets. Thus, the use of data mining techniques in existing datasets will be
calculated according to the algorithm chosen to achieve accurate levels in a maximum percentage value.
Data mining techniques can be applied in various fields that are used to support a wide range of
applications [4].
The dataset and several attributes found in the Surabaya branch of Bank Bukopin customers, which
then the dataset is processed by a classification method to estimate the class of objects whose labels are
unknown. This effort was made so that the Bank Bukopin Surabaya Branch could predict the decisions
related to the type of pension in the future. Then can estimate prospective customers who are projected
to be customers of Bank Bukopin Surabaya Branch.
2. Classification
Classification method that is studying a set of data that can be produced by a new classification of data.
The classification process in data mining techniques is a set of data that can produce a classification
model (target function). So, it requires a dataset on the set for the classification process. The dataset
used is attributes and features using training data and testing data. Classification techniques can be
grouped into two categories, namely the classification technique globally calculates all training data and
classification locally taking into account some training data [5].
Where:
S: set of cases
A: attribute
n: number of partition attributes A
Si: number of cases on the i-partition
S: number of cases in S
𝑛 (2)
𝑆𝑖
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) ∑ ∗ 𝑝𝑖 ∗ 𝑙𝑜𝑔2 𝑝𝑖
𝑆
𝑖=1
Where :
S: set of cases
2
4th Annual Applied Science and Engineering Conference IOP Publishing
Journal of Physics: Conference Series 1402 (2019) 066096 doi:10.1088/1742-6596/1402/6/066096
A: feature
n: number of S partitions
pi: proportion of Si to S
(3)
𝑃(𝑋|𝐻). 𝑃(𝐻)
𝑃(𝐻|𝑋) =
𝑃(𝑋)
Where,
X : Data with unknown classes.
H : Data hypothesis with specific classes.
P (H) : Probability of H. hypothesis
P (X) : Probability X.
P (H | X) : Posterior probability H with condition X.
P (X│H) : Posterior probability X with the condition H.
3. Methods
The condition of the problem is based on data mining techniques to predict the type of customer
retirement. Designing a classification diagram using diagrams:
3
4th Annual Applied Science and Engineering Conference IOP Publishing
Journal of Physics: Conference Series 1402 (2019) 066096 doi:10.1088/1742-6596/1402/6/066096
4
4th Annual Applied Science and Engineering Conference IOP Publishing
Journal of Physics: Conference Series 1402 (2019) 066096 doi:10.1088/1742-6596/1402/6/066096
And the total score for the results of 454 testing data in table 5, the results of the class prediction are
obtained where the best is 97.14%, then 92.51% for the NaiveBayes algorithm and for the C4.5
algorithm and the lowest is 86.12% for the k-NN algorithm.
Table 5. Class prediction result
Status Normal Accelerated Total Percentage
NaiveBayes
Benar 261 159 420 92.51%
Salah 28 6 34 7.49%
C45
Benar 286 155 441 97.14%
Salah 3 10 13 2.86%
k-NN
Benar 264 127 391 86.12%
Salah 24 28 52 11.45%
5
4th Annual Applied Science and Engineering Conference IOP Publishing
Journal of Physics: Conference Series 1402 (2019) 066096 doi:10.1088/1742-6596/1402/6/066096
4. Conclusion
Analysis of research from three algorithms (C4.5, Naive Bayes and k-NN), which uses a dataset of 1,299
from PT. Bank Bukopin Surabaya Branch which was divided into 845 training data and 454 testing data.
From the comparison of three algorithms, the best results are obtained for C4.5 Algorithm, namely with
the correct prediction class value of 97.14% for C4.5 Algorithm, then the Naive Bayes algorithm is
92.51% and k-NN is 86.12%. So that the application is possible related to the analysis of the type of
pension for customers of PT. Bank Bukopin Surabaya Branch can be relevantly implemented.
Acknowledgements
We hereby thank you to Universitas Muhammadiyah Sidoarjo for supporting the publication of this
research.
References
[1] Argo P J K 2014 Analisis Peraturan Batas Usia Pensiun Pns Dalam Uu No. 5 Tahun 2014
Tentang Aparatur Sipil Negara
[2] Bagaskara R G and Lestiawan H Implementasi Algoritma Fuzzy C-Means Untuk Menemukan
Pengelompokan Data Pensiun Di Badan Kepegawaian Daerah Kota Semarang 1–18
[3] Riza M, Kudang B M and Agus A 2018 Pembentukan Target Pasar Berdasarkan Data Stream
Transaksi Kartu Kredit (Clustering Dan Association Rule) Pada Pt Bank Bukopin 4 1
[4] Widayu H, Nasution S D, Silalahi N and Mesran 2017 Data Mining Untuk Memprediksi Jenis
Transaksi Nasabah Pada Koperasi Simpan Pinjam Dengan Algoritma C4.5 Media Informatika
Budidarma 1 2 pp 32–37
[5] Rosid M A, Rachmadany A, Multazam M T, Nandiyanto A B D, Abdullah A G and Widiaty I
2018 Integration Telegram Bot on E-Complaint Applications in College IOP Conf. Ser. Mater.
Sci. Eng. 288 1
[6] Arif S F Prediksi Kelulusan Mahasiswa (Strata 1) Universitas Muhammadiyah Sidoarjo
Menggunakan Algoritma C4.5
[7] Jananto A 2013 Algoritma Naive Bayes Untuk Mencari Perkiraan Waktu Studi Mahasiswa 18 1
9–16
[8] Everitt B S, Landau S, Leese M and Stahl D Miscellaneous
[9] Cover T and Hart P 1967 Nearest neighbor pattern classification Information Theory, IEEE
Transactions on 13 1 21-27